DataCanvasIO / DeepTables

DeepTables: Deep-learning Toolkit for Tabular data
https://deeptables.readthedocs.io
Apache License 2.0
659 stars 117 forks source link

Model cannot be saved #91

Closed miranska closed 8 months ago

miranska commented 8 months ago

System information

Describe the current behavior

I run sample classification code from the documentation:

During execution, I see the following message in the logs:

01-04 14:44:02 I deeptables.m.deeptable.py 369 - Training finished.
./miniforge3/envs/sample_deeptable/lib/python3.11/site-packages/deeptables/models/deepmodel.py:188: UserWarning: You are saving your model as an HDF5 file via `model.save()`. This file format is considered legacy. We recommend using instead the native Keras format, e.g. `model.save('my_model.keras')`.
  save_model(self.model, h, save_format='h5')
01-04 14:44:02 I deeptables.m.deeptable.py 704 - Model has been saved to:dt_output/dt_20240104144359_linear_dnn_nets/linear+dnn_nets.h5

Describe the expected behavior

I do not see any output in the dt_output subdirectory where my source file was located. Furthermore, I couldn't find linear+dnn_nets.h5 anywhere on the hard drive.

Standalone code to reproduce the issue

# sample code from https://deeptables.readthedocs.io/en/latest/examples.html
from deeptables.models.deeptable import DeepTable, ModelConfig
from deeptables.models.deepnets import WideDeep
from deeptables.datasets import dsutils
from sklearn.model_selection import train_test_split

# Adult Data Set from UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets/Adult
df_train = dsutils.load_adult()
y = df_train.pop(14)
X = df_train

conf = ModelConfig(nets=WideDeep, metrics=["AUC", "accuracy"], auto_discrete=True)
dt = DeepTable(config=conf)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model, history = dt.fit(X_train, y_train, epochs=100)
oaksharks commented 8 months ago

Hi @miranska ! The model file dt_output/dt_20240104144359_linear_dnn_nets/linear+dnn_nets.h5 that recorded in your log is exactly a path that relative to DeepTables's workdir. You can check the workdir by code:

from hypernets.utils import fs
print(f"workdir: {fs.remote_root_}")

In my OS (centos 7.9), it's /tmp/workdir, so I should check the files generated by deeptables in this directory.

For the warning below:

You are saving your model as an HDF5 file via `model.save()`. This file format is considered legacy. We recommend using instead the native Keras format, e.g. `model.save('my_model.keras')`.

It does not prevent model persistence, if you want to eliminate this warning, just downgrade tensorflow<=2.11.0

For further infomation of model persistence, you can refer to this example:

miranska commented 8 months ago

@oaksharks, great thank you! So in order to persist a model in my own directory, I should first save it to fs.remote_root_ directory (which I can't change) and then move the model to my own directory using something like shutil.move(), correct?

oaksharks commented 8 months ago

You can do that shutil.move() or change the workdir, see more https://github.com/DataCanvasIO/DeepTables/issues/85

miranska commented 8 months ago

Thank you!