how to run the codes - Githubissues

naimesha commented 4 years ago

can somebody please explain how to run the codes and what is config in train_classifier()

cemanil commented 4 years ago

Hi,

"Config" in train_classifier is an object that contains the details of the experiment configuration.

Do you mind elaborating on what you wish to run that's not (or insufficiently) covered in the README?

naimesha commented 4 years ago

Hey! For example consider standard classifier. There is a .json file and .py file, I am supposed to give the .json data to the .py right? How do I give it? Should we give it manually or is there any code which will directly take it from the .json file?

On Tue, 30 Jun, 2020, 7:10 AM Cem Anil, notifications@github.com wrote:

Hi,

"Config" in train_classifier is an object that contains the details of the experiment configuration.

Do you mind elaborating on what you wish to run that's not (or insufficiently) covered in the README?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cemanil/LNets/issues/9#issuecomment-651466994, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHMJFZOZXSPYRF4MTOSFJB3RZE7CLANCNFSM4OLN5XRA .

naimesha commented 4 years ago

Hey! It's kind of important for me because I am doing this project as a part of my final year project. Please help me out.

Thank you

On Wed, 1 Jul, 2020, 11:27 AM naimesha pallapothu, < pallapothunaimesha@gmail.com> wrote:

Hey! For example consider standard classifier. There is a .json file and .py file, I am supposed to give the .json data to the .py right? How do I give it? Should we give it manually or is there any code which will directly take it from the .json file?

On Tue, 30 Jun, 2020, 7:10 AM Cem Anil, notifications@github.com wrote:

Hi,

"Config" in train_classifier is an object that contains the details of the experiment configuration.

Do you mind elaborating on what you wish to run that's not (or insufficiently) covered in the README?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cemanil/LNets/issues/9#issuecomment-651466994, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHMJFZOZXSPYRF4MTOSFJB3RZE7CLANCNFSM4OLN5XRA .

cemanil commented 4 years ago

It sounds like the "Tasks" section of the README contains what you need. For example, you can run """ python ./lnets/tasks/classification/mains/train_classifier.py ./lnets/tasks/classification/configs/standard/fc_classification.json """ to train a classification network. The json file is directly processed.

Hope this helps.

naimesha commented 4 years ago

Hey! Thanks for writing back but it's showing attribute error.

https://user-images.githubusercontent.com/30970597/86271805-2de65580-bbeb-11ea-970b-a7cc03b815d0.jpeg this link shows an image of the error.

On Wed, 1 Jul, 2020, 9:03 PM Cem Anil, notifications@github.com wrote:

It sounds like the "Tasks" section of the README contains what you need. For example, you can run """ python ./lnets/tasks/classification/mains/train_classifier.py ./lnets/tasks/classification/configs/standard/fc_classification.json """ to train a classification network. The json file is directly processed.

Hope this helps.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cemanil/LNets/issues/9#issuecomment-652490360, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHMJFZP4IRBSGCO7ICRTGNLRZNJN5ANCNFSM4OLN5XRA .

naimesha commented 4 years ago

Are we supposed to change anything because it's showing file not found

this is the error

Averaged validation loss: -0.9927879944443703 Traceback (most recent call last): File "./lnets/tasks/dualnets/mains/train_dual.py", line 176, in final_state = train_dualnet(dual_model, distrib_loaders, cfg) File "./lnets/tasks/dualnets/mains/train_dual.py", line 151, in train_dualnet model.load_state_dict(torch.load(best_model_path)) File "/home/naimesha/anaconda3/envs/lnets/lib/python3.7/site-packages/torch/serialization.py", line 584, in load with _open_file_like(f, 'rb') as opened_file: File "/home/naimesha/anaconda3/envs/lnets/lib/python3.7/site-packages/torch/serialization.py", line 234, in _open_file_like return _open_file(name_or_buffer, mode) File "/home/naimesha/anaconda3/envs/lnets/lib/python3.7/site-packages/torch/serialization.py", line 215, in init super(_open_file, self).init(open(name, mode)) FileNotFoundError: [Errno 2] No such file or directory: 'out/wde/wasserstein_distance_estimation_absolute_value_experiment_MultiSphericalShell_and_MultiSphericalShell_aggmo_0.01_dual_fc_linear_bjorck_act_maxmin_depth_3_width_128_grouping_2_2020_07_01_21_34_29_992881/checkpoints/best/best_model.pt'

On Wed, 1 Jul, 2020, 9:03 PM Cem Anil, notifications@github.com wrote:

It sounds like the "Tasks" section of the README contains what you need. For example, you can run """ python ./lnets/tasks/classification/mains/train_classifier.py ./lnets/tasks/classification/configs/standard/fc_classification.json """ to train a classification network. The json file is directly processed.

Hope this helps.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cemanil/LNets/issues/9#issuecomment-652490360, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHMJFZP4IRBSGCO7ICRTGNLRZNJN5ANCNFSM4OLN5XRA .

naimesha commented 4 years ago

please reply for the error. thanks inadvance

cemanil commented 4 years ago

I see - It is possible that the attribute error you're getting is because you're using a different pytorch version. Which version are you using?

The majority of the code should run without problems with the current version, but it might take a few minor modifications.

naimesha commented 4 years ago

I am using pytorch build - stable(1.5.1) with cuda 10.2 which version i install to make the code run?

And also what about the other error file not found or no such directory

On Wed, 1 Jul, 2020, 10:45 PM Cem Anil, notifications@github.com wrote:

I see - It is possible that the attribute error you're getting is because you're using a different pytorch version. Which version are you using?

The majority of the code should run without problems with the current version, but it might take a few minor modifications.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cemanil/LNets/issues/9#issuecomment-652545106, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHMJFZL2DAHWWTSLOHLID5LRZNVLNANCNFSM4OLN5XRA .

cemanil commented 4 years ago

That might be it - the code is only tested rigorously on PyTorch version 0.4.. We are planning to upgrade the repo at some point in the future, but that might not be soon enough for your final year project. Perhaps you can try running things on pytorch 0.4?

The other error seems to be due to the fact that the program is trying to load a model that hasn't been saved during training. In the config, you'll find logging.save_model field. Setting that to True should fix the problem.

naimesha commented 4 years ago

okay!! thank you so much for replying i am trying to run the code using pytorch 0.4 and also will try to run the code using suggested changes.

On Wed, 1 Jul 2020 at 23:23, Cem Anil notifications@github.com wrote:

That might be it - the code is only tested rigorously on PyTorch version 0.4.. We are planning to upgrade the repo at some point in the future, but that might not be soon enough for your final year project. Perhaps you can try running things on pytorch 0.4?

The other error seems to be due to the fact that the program is trying to load a model that hasn't been saved during training. In the config, you'll find logging.save_model field. Setting that to True should fix the problem.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cemanil/LNets/issues/9#issuecomment-652562643, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHMJFZNUKXWMNLLBQWBMDKDRZNZ2NANCNFSM4OLN5XRA .

naimesha commented 4 years ago

i tried running the code after changing the loggig.save_model to true but i am still seeing the error https://user-images.githubusercontent.com/30970597/86319293-d8915f00-bc51-11ea-93e0-1a7ff8cd9bb4.jpeg after the above error i also changed the logging.best_model to true and got the below error https://user-images.githubusercontent.com/30970597/86319357-fc54a500-bc51-11ea-96f2-74bdbd31b847.jpeg

naimesha commented 4 years ago

hey! i am experiencing the same error for all codes. error is no such file or directory. https://user-images.githubusercontent.com/30970597/86326974-1a290680-bc60-11ea-9fb1-b74af69a28a3.png

naimesha commented 4 years ago

hey! i closed the issue by mistake. please reply when you can. thank you

cemanil commented 4 years ago

Hi,

I cannot reproduce the error you're getting. In my setup, the best models get saved and are successfully loaded for validation.

This is the command I ran: """ python ./lnets/tasks/dualnets/mains/train_dual.py ./lnets/tasks/dualnets/configs/absolute_value_experiment.json """ The only modifications I made in the json are 1) set save_model and save_best to True. 2) Reduce the training epochs (so that I can debug faster)

Here are the last few lines printed out by the program before it terminates: """ Epoch 8: 16it [00:00, 37.16it/s] Training loss: -0.9953 Saving new best model at out/wde/wasserstein_distance_estimation_absolute_value_experiment_MultiSphericalShell_and_MultiSphericalShell_aggmo_0.01_dual_fc_linear_bjorck_act_maxmin_depth_3_width_128_grouping_2_2020_07_02_10_22_28_870290/checkpoints/best. Averaged validation loss: -0.995928555727005 Epoch 9: 16it [00:00, 36.52it/s] Training loss: -0.9946 Averaged validation loss: -0.996421679854393 Epoch 10: 16it [00:00, 36.44it/s] Training loss: -0.9953 Averaged validation loss: -0.9966489151120186 Epoch 11: 16it [00:00, 36.62it/s] Training loss: -0.9932 Averaged validation loss: -0.9966634809970856 Epoch 12: 16it [00:00, 35.48it/s] Training loss: -0.9942 Averaged validation loss: -0.9944535940885544 Epoch 13: 16it [00:00, 34.49it/s] Training loss: -0.9943 Averaged validation loss: -0.988710567355156 Epoch 14: 16it [00:00, 34.47it/s] Training loss: -0.9945 Averaged validation loss: -0.9972907453775406 Epoch 15: 16it [00:00, 34.08it/s] Training loss: -0.9912 Averaged validation loss: -0.9979196637868881 Testing best model. Averaged validation loss: -0.995915874838829 """

At epoch 8, the best model until that point gets saved.

Could you confirm: 1) Your program does print out lines starting with "Saving new best model at ..." 2) After those lines appear, the models do get saved in the directories specified ?

naimesha commented 4 years ago

this is what i got

Epoch 0: 16it [00:00, 28.46it/s] Training loss: -0.3053 Saving new best model at out/wde/wasserstein_distance_estimation_absolute_value_experiment_MultiSphericalShell_and_MultiSphericalShell_aggmo_0.01_dual_fc_linear_bjorck_act_maxmin_depth_3_width_128_grouping_2_2020_07_02_20_10_28_822006/checkpoints/best. Averaged validation loss: -0.6530682630836964 Epoch 1: 16it [00:00, 26.09it/s] Training loss: -0.8605 Saving new best model at out/wde/wasserstein_distance_estimation_absolute_value_experiment_MultiSphericalShell_and_MultiSphericalShell_aggmo_0.01_dual_fc_linear_bjorck_act_maxmin_depth_3_width_128_grouping_2_2020_07_02_20_10_28_822006/checkpoints/best. Averaged validation loss: -0.9699119031429291 Epoch 2: 16it [00:00, 25.11it/s] Training loss: -0.9813 Saving new best model at out/wde/wasserstein_distance_estimation_absolute_value_experiment_MultiSphericalShell_and_MultiSphericalShell_aggmo_0.01_dual_fc_linear_bjorck_act_maxmin_depth_3_width_128_grouping_2_2020_07_02_20_10_28_822006/checkpoints/best. Averaged validation loss: -0.9791331179440022 Epoch 3: 16it [00:00, 25.34it/s] Training loss: -0.9769 Averaged validation loss: -0.9707604050636292 Epoch 4: 16it [00:00, 24.95it/s] Training loss: -0.9684 Averaged validation loss: -0.9677758105099201 Epoch 5: 16it [00:00, 26.24it/s] Training loss: -0.9669 Averaged validation loss: -0.9697879105806351 Epoch 6: 16it [00:00, 26.73it/s] Training loss: -0.9718 Averaged validation loss: -0.9707205519080162 Epoch 7: 16it [00:00, 25.32it/s] Training loss: -0.9763 Averaged validation loss: -0.9752324745059013 Epoch 8: 16it [00:00, 26.35it/s] Training loss: -0.9805 Averaged validation loss: -0.9864860586822033 Epoch 9: 16it [00:00, 24.99it/s] Training loss: -0.9858 Saving new best model at out/wde/wasserstein_distance_estimation_absolute_value_experiment_MultiSphericalShell_and_MultiSphericalShell_aggmo_0.01_dual_fc_linear_bjorck_act_maxmin_depth_3_width_128_grouping_2_2020_07_02_20_10_28_822006/checkpoints/best. Averaged validation loss: -0.9862877279520035 Epoch 10: 16it [00:00, 30.90it/s] Training loss: -0.9890 Saving new best model at out/wde/wasserstein_distance_estimation_absolute_value_experiment_MultiSphericalShell_and_MultiSphericalShell_aggmo_0.01_dual_fc_linear_bjorck_act_maxmin_depth_3_width_128_grouping_2_2020_07_02_20_10_28_822006/checkpoints/best. Averaged validation loss: -0.9877906292676926 Epoch 11: 16it [00:00, 26.30it/s] Training loss: -0.9908 Saving new best model at out/wde/wasserstein_distance_estimation_absolute_value_experiment_MultiSphericalShell_and_MultiSphericalShell_aggmo_0.01_dual_fc_linear_bjorck_act_maxmin_depth_3_width_128_grouping_2_2020_07_02_20_10_28_822006/checkpoints/best. Averaged validation loss: -0.9939416795969009 Epoch 12: 16it [00:00, 31.36it/s] Training loss: -0.9935 Saving new best model at out/wde/wasserstein_distance_estimation_absolute_value_experiment_MultiSphericalShell_and_MultiSphericalShell_aggmo_0.01_dual_fc_linear_bjorck_act_maxmin_depth_3_width_128_grouping_2_2020_07_02_20_10_28_822006/checkpoints/best. Averaged validation loss: -0.991676576435566 Epoch 13: 16it [00:00, 30.95it/s] Training loss: -0.9943 Saving new best model at out/wde/wasserstein_distance_estimation_absolute_value_experiment_MultiSphericalShell_and_MultiSphericalShell_aggmo_0.01_dual_fc_linear_bjorck_act_maxmin_depth_3_width_128_grouping_2_2020_07_02_20_10_28_822006/checkpoints/best. Averaged validation loss: -0.9936381727457047 Epoch 14: 16it [00:00, 26.20it/s] Training loss: -0.9920 Averaged validation loss: -0.9930611923336983 Testing best model. Averaged validation loss: -0.9936569929122925 Traceback (most recent call last): File "./lnets/tasks/dualnets/mains/train_dual.py", line 176, in final_state = train_dualnet(dual_model, distrib_loaders, cfg) File "./lnets/tasks/dualnets/mains/train_dual.py", line 162, in train_dualnet after_training=False) File "/home/naimesha/lnets/utils/saving_and_loading.py", line 105, in save_1_or_2_dim_dualnet_visualizations save_1d_dualnet_visualizations(model, figures_dir, config, epoch, loss) File "/home/naimesha/lnets/tasks/dualnets/visualize/visualize_dualnet.py", line 134, in save_1d_dualnet_visualizations save_path = os.path.join(figuresdir, "epoch{:04}_visualize1d".format(epoch)) TypeError: unsupported format string passed to NoneType.format

naimesha commented 4 years ago

and also how do i get the graphs?

cemanil commented 4 years ago

Ah, the unsupported string error can be resolved by modifying {:04} to {}.

The graphs should be saved automatically, (as long as you save the "visualize" flag is set to true, which is the default).

cemanil commented 4 years ago

You probably need to install the foolbox package.

naimesha commented 4 years ago

and also how do we get the test error value for classification experiment? i am able to train the model without any errors. But i couldn't get the test error value sorry for the previous doubt i just thought everything was ready to go as i installed the setup.py

cemanil commented 4 years ago

Hmm, I expected the training script to automatically run validation. What are the last few lines the training script prints out?

naimesha commented 4 years ago

i got the validation acc and loss.log files

naimesha commented 4 years ago

and about the foolbox. foolbox is already installed. its showing there is no module named foolbox.adversial

cemanil commented 4 years ago

I see - I suspect this is due to the fact that the foolbox package changed since we released the code. Maybe you could try downgrading foolbox and see if this helps?

naimesha commented 4 years ago

what about this error? from .distances import MSE ImportError: cannot import name 'MSE'

cemanil commented 4 years ago

I'm guessing you encountered that when you tried to run "eval_adv_robustness.py" (line 79-80)?

If that's the case, then I believe the problem might also be due to the foolbox version and downgrading might help.

Pallapothu-Naimesha commented 4 years ago

hey! how do we change the depth for high dimensional code experiment?

cemanil commented 4 years ago

Try adding more hidden layer sizes to the "layers field? "layers": [ 128, 128, ..., 1 ],

Pallapothu-Naimesha commented 4 years ago

hey

can you briefly write what types of learning did we use for different experiments?

cemanil / LNets

how to run the codes #9