DataCanvasIO / DeepTables

DeepTables: Deep-learning Toolkit for Tabular data
https://deeptables.readthedocs.io
Apache License 2.0
670 stars 118 forks source link

Performance may be degraded by the initialization scheme #90

Open miranska opened 10 months ago

miranska commented 10 months ago

System information

Describe the current behavior

I run sample classification code from the documentation:

During execution, I see the following message in the logs:

01-04 14:38:13 I deeptables.m.deepmodel.py 231 - Building model...
./miniforge3/envs/sample_deeptable/lib/python3.11/site-packages/keras/src/initializers/initializers.py:120: UserWarning: The initializer RandomUniform is unseeded and being called multiple times, which will return identical values each time (even if the initializer is unseeded). Please update your code to provide a seed to the initializer, or avoid using the same initializer instance more than once.

Describe the expected behavior

There seems to be a need to modify the initialization schema of the WideDeep layer to improve performance and eliminate the warning.

Standalone code to reproduce the issue

# sample code from https://deeptables.readthedocs.io/en/latest/examples.html
from deeptables.models.deeptable import DeepTable, ModelConfig
from deeptables.models.deepnets import WideDeep
from deeptables.datasets import dsutils
from sklearn.model_selection import train_test_split

# Adult Data Set from UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets/Adult
df_train = dsutils.load_adult()
y = df_train.pop(14)
X = df_train

conf = ModelConfig(nets=WideDeep, metrics=["AUC", "accuracy"], auto_discrete=True)
dt = DeepTable(config=conf)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model, history = dt.fit(X_train, y_train, epochs=100)
oaksharks commented 3 months ago

This warning seems to have appeared in high versions of TensorFlow, at this stage, versions below <=2.9 can be used to avoid it.