BrandonSmithJ / MDN

Mixture Density Network for water constituent estimation
GNU General Public License v3.0
40 stars 34 forks source link

No trained model exist at #19

Open kollurusrinivas1 opened 3 months ago

kollurusrinivas1 commented 3 months ago

Hello,

I am trying to run MDN. I saw one of the fixed issues and downloaded the weights in the fixed issue thread. However, i ran into this problem. Your help is greatly appreciated. thank you.   Exception: No trained model exists at: /Users/Sri/MDN/Weights/OLI/70c22252dff6bc608d1d8e15b1d2d9e62cdaceecf7217f268192964a4c4c1871/Round_0

BrandonSmithJ commented 3 months ago

Did you place the downloaded weights into that location? For that specific model, you should download the OLI.zip file and extract it into your MDN/Weights directory.

This will lead to the file MDN/Weights/OLI/70c22252dff6bc608d1d8e15b1d2d9e62cdaceecf7217f268192964a4c4c1871.zip being created, at which point you can run the MDN program and it will handle the proper extraction and usage of that zip file to find the pretrained model weights.

kollurusrinivas1 commented 3 months ago

Yes, I did. Here's the specific error I obtained.

Exception: No trained model exists at: /Users/Sri/MDN/Weights/OLI/70c22252dff6bc608d1d8e15b1d2d9e62cdaceecf7217f268192964a4c4c1871/Round_0

This is the message log in Jupyternotebook. I might be missing something. /Users/Sri/anaconda3/envs/Cornell_course/lib/python3.11/site-packages/pandas/core/arrays/masked.py:60: UserWarning: Pandas requires version '1.3.6' or newer of 'bottleneck' (version '1.3.5' currently installed). from pandas.core import ( Generating estimates for 6 data points ((6, 5)) Input Rrs Shape: (6, 5) N Valid: [6, 6, 6, 6, 6] Minimum: [ 0.03, 0.04, 0.05, 0.04, 0.04] Maximum: [ 0.12, 0.13, 0.15, 0.15, 0.29]

Could not find config file with the following parameters: Version: 2.0.1

Dependencies
------------
align             : None
batch             : 128
benchmark         : False
epsilon           : 0.001
imputations       : 5
l2                : 0.001
lr                : 0.001
model_lbl         : 
n_hidden          : 100
n_iter            : 10000
n_layers          : 5
n_mix             : 5
no_bagging        : False
product           : chl
sat_bands         : False
seed              : 42
sensor            : OLI
subset            : 
x_scalers         : 
    ('RobustScaler', [], {})
y_scalers         : 
    ('LogTransformer', [], {})
    ('MinMaxScaler', [(-1, 1)], {}) 

0%| | 0/10 [00:00<?, ?it/s] Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "/Users/Sri/MDN/main.py", line 4, in main() File "/Users/Sri/MDN/product_estimation.py", line 237, in main estimates, slices = get_estimates(args, x_test=x_test) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/Sri/MDN/product_estimation.py", line 97, in get_estimates model.fit(x_train, y_train, output_slices, args=args, datasets=datasets) File "/Users/Sri/MDN/utils.py", line 22, in helper return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/Users/Sri/MDN/model/MDN.py", line 291, in fit raise Exception(f"No trained model exists at: \n{self.model_path}") Exception: No trained model exists at: /Users/Sri/MDN/Weights/OLI/70c22252dff6bc608d1d8e15b1d2d9e62cdaceecf7217f268192964a4c4c1871/Round_0 Selection deleted

kollurusrinivas1 commented 3 months ago
Screenshot 2024-08-09 at 12 28 03

The attached image has the path in the bottom and the .zip files in the folder.

BrandonSmithJ commented 3 months ago

It is likely an issue with Jupyter interaction. You can try just running as a script rather than in a notebook, which should work.

If you need to use a notebook, however: I assume in the notebook somewhere, you're calling the get_args function, right? If so, try setting the 'use_cmdline' keyword argument for that like: args = get_args(<whatever you're currently setting>, use_cmdline=False)

If you're not using the get_args function in your notebook, please share the code for the cell which is generating the error.

BrandonSmithJ commented 3 months ago

I noticed that depending on version, the weights may be extracted into a twice-nested hash folder instead of once; e.g. /Users/Sri/MDN/Weights/OLI/70c22252dff6bc608d1d8e15b1d2d9e62cdaceecf7217f268192964a4c4c1871/70c22252dff6bc608d1d8e15b1d2d9e62cdaceecf7217f268192964a4c4c1871/Round_0.

I've now migrated the repository off of using git-lfs, so you should no longer need to download the weights manually. As well, I've corrected the issue with improper zip extraction. If you are still encountering issues, please try cloning the repository again and running your code.

kollurusrinivas1 commented 3 months ago

Hello Brandon,

I tried the following.

  1. I removed the MDN folder and cloned the MDN respository.
  2. I also created a new environment and cloned the MDN repository.

below is the code i ran test_MDN_sample_Landsat2.csv

!python3 -m MDN --sensor "OLI" /Users/Sri/Downloads/test_MDN_sample_Landsat2.csv

Generating estimates for 6 data points ((6, 4)) Input Rrs Shape: (6, 4) N Valid: [6, 6, 6, 6] Minimum: [ 0.03, 0.04, 0.05, 0.04] Maximum: [ 0.12, 0.13, 0.15, 0.15]

0%| | 0/10 [00:00<?, ?it/s] Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "/Users/Sri/MDN/main.py", line 4, in main() File "/Users/Sri/MDN/product_estimation.py", line 237, in main estimates, slices = get_estimates(args, x_test=x_test) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/Sri/MDN/product_estimation.py", line 97, in get_estimates model.fit(x_train, y_train, output_slices, args=args, datasets=datasets) File "/Users/Sri/MDN/utils.py", line 22, in helper return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/Users/Sri/MDN/model/MDN.py", line 230, in fit self.load() File "/Users/Sri/MDN/model/MDN.py", line 375, in load self.update_config(read_pkl(self.model_path.joinpath('config.pkl')), ['scalerx', 'scalery', 'tf_random', 'np_random']) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/Sri/MDN/utils.py", line 143, in read_pkl return CustomUnpickler(f).load() ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/Sri/MDN/utils.py", line 132, in find_class return super().find_class(module, name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ModuleNotFoundError: No module named 'tensorflow.python.training.tracking'

I installed tensorflow, keras, tensorflow_probability, but i still get this error.

BrandonSmithJ commented 3 months ago

How did you install the libraries? Are you using the versions specified in the repository requirements.txt file? You should set up your virtual environment, and then install the required libraries via python -m pip install -r MDN/requirements.txt. The error you're getting indicates the incorrect version of tensorflow is installed.

kollurusrinivas1 commented 3 months ago

Thank you. I tried in the way you suggested. But I got the following error.

Collecting h5py==3.1.0 (from -r MDN/requirements.txt (line 18)) Using cached h5py-3.1.0.tar.gz (371 kB) Installing build dependencies ... done Getting requirements to build wheel ... done Installing backend dependencies ... error error: subprocess-exited-with-error

× pip subprocess to install backend dependencies did not run successfully. │ exit code: 1 ╰─> [1075 lines of output] Collecting pkgconfig Obtaining dependency information for pkgconfig from https://files.pythonhosted.org/packages/32/af/89487c7bbf433f4079044f3dc32f9a9f887597fe04614a37a292e373e16b/pkgconfig-1.5.5-py3-none-any.whl.metadata Using cached pkgconfig-1.5.5-py3-none-any.whl.metadata (4.0 kB) Collecting numpy==1.19.3 Using cached numpy-1.19.3.zip (7.3 MB) Installing build dependencies: started Installing build dependencies: finished with status 'done' Getting requirements to build wheel: started Getting requirements to build wheel: finished with status 'done' Preparing metadata (pyproject.toml): started Preparing metadata (pyproject.toml): finished with status 'done' Collecting Cython>=0.29.14 Obtaining dependency information for Cython>=0.29.14 from https://files.pythonhosted.org/packages/43/39/bdbec9142bc46605b54d674bf158a78b191c2b75be527c6dcf3e6dfe90b8/Cython-3.0.11-py2.py3-none-any.whl.metadata Using cached Cython-3.0.11-py2.py3-none-any.whl.metadata (3.2 kB) Using cached pkgconfig-1.5.5-py3-none-any.whl (6.7 kB) Using cached Cython-3.0.11-py2.py3-none-any.whl (1.2 MB) Building wheels for collected packages: numpy Building wheel for numpy (pyproject.toml): started Building wheel for numpy (pyproject.toml): finished with status 'error' error: subprocess-exited-with-error

    × Building wheel for numpy (pyproject.toml) did not run successfully.
    │ exit code: 1
    ╰─> [1042 lines of output]
        Running from numpy source directory.

And there is a big list of errors that followed.

I was wondering if there is a working environment that you use that can be imported into Anaconda. Also, I am using a Mac with an M1 chip and an Anaconda environment.

BrandonSmithJ commented 3 months ago

There are a number of potential ways to handle this, but here's the simplest: first create your environment with

conda create --yes --prefix ./env_MDN python=3.8
conda activate ./env_MDN

Then install h5py in your environment with conda:

conda install --yes h5py=3.1.0 -c conda-forge

Finally use pip to install the remaining libraries:

python -m pip install -r MDN/requirements.txt -U --upgrade-strategy=only-if-needed

For any other libraries that fail to install with pip, just install them with conda first (ensuring you install the specific version listed in the requirements.txt file); then re-run the pip install for the requirements.txt file using the -U --upgrade-strategy=only-if-needed arguments (so pip skips already installed packages).

Alternatively, just use the environment you were using before, but pip install the specific tensorflow/numpy/tensorflow_probability versions that are specified (though this is more likely to result in a broken environment than what I've suggested above with creating a new, clean one).