Model training - Githubissues

ldhkc commented 9 months ago

Dear Author Hi dalao, I am a rookie and I found your project on github and was interested in it. I followed the help instructions you posted on github step by step to configure the environment and then also ran the code for that training, I am training on qm9 dataset. But I didn't find this epoch related code, I don't know how many times this epoch is, can I modify this epoch by myself? Can I modify this epoch by myself? How do I do it, and do I refer to the help notes you posted? I really hope to receive your reply, thank you very much!

ldhkc commented 9 months ago

One more question, in which directory is the output saved after the training is done? In other words where is the output after training?

NiklasGebauer commented 9 months ago

Hi @ldhkc , sorry for the delayed reply! I'm glad you're interested in our models.

For training cG-SchNet, we usually do not rely on a maximum number of epochs. Instead, we use a learning rate scheduler that reduces the learning rate if the validation loss does not improve for X epochs and terminates training as soon as a minimum learning rate Y is achieved. Training is also stopped if the validation loss does not improve for Z consecutive epochs. The default values are currently X=10, Y=1e-6, and Z=25 and you can set them in the CLI by using task.scheduler_args.patience=X, task.scheduler_args.min_lr=Y, and callbacks.early_stopping.patience=Z, respectively.

If you just want to play around a bit and don't need to train the model until convergence, you can still specify a maximum number of epochs, e.g. 20, for training with trainer.max_epochs=20.

Regarding your second question, by default, the training will use your current working directory and create the following folder: /models/qm9_${globals.name}, where you can set globals.name in the CLI. The checkpoint of the best model is stored at /models/qm9_${globals.name}/best_model at the end of training. You can use it to generate molecules by following our instructions here.

Hope this helps! Best regards, Niklas

ldhkc commented 9 months ago

Hi @ldhkc , sorry for the delayed reply! I'm glad you're interested in our models.

For training cG-SchNet, we usually do not rely on a maximum number of epochs. Instead, we use a learning rate scheduler that reduces the learning rate if the validation loss does not improve for X epochs and terminates training as soon as a minimum learning rate Y is achieved. Training is also stopped if the validation loss does not improve for Z consecutive epochs. The default values are currently X=10, Y=1e-6, and Z=25 and you can set them in the CLI by using task.scheduler_args.patience=X, task.scheduler_args.min_lr=Y, and callbacks.early_stopping.patience=Z, respectively.

If you just want to play around a bit and don't need to train the model until convergence, you can still specify a maximum number of epochs, e.g. 20, for training with trainer.max_epochs=20.

Regarding your second question, by default, the training will use your current working directory and create the following folder: /models/qm9_${globals.name}, where you can set globals.name in the CLI. The checkpoint of the best model is stored at /models/qm9_${globals.name}/best_model at the end of training. You can use it to generate molecules by following our instructions here.

Hope this helps! Best regards, Niklas

Ok ok, thank you very much for your reply, I'm just in the trying stage now, some time ago I referred to the help instructions on your github and found the .yaml file and added the max_epochs parameter in it, since I'm in the trying stage, I'm just going to set the value small, then after the training there will be a test, and at the end, it's going to output some test metrics, as well as the best model file. In a follow up, I would consider letting it train itself to a stop and use it for generating molecules and such in conjunction with the follow up help instructions you posted. Lastly, thank you very much for the help I received from your reply!

ldhkc commented 9 months ago

One more question. Can I leave you a contact? If I have any questions in the subsequent fiddling around I may have to come and ask you for advice, thank you very much.

ldhkc commented 9 months ago

Hi author, now I have another problem, that is, I first trained only a few epochs, I did the training on the server, then it also generated those files as you said in the generator section, e.g.: the directory containing the files best_model, cli.log, config. yaml etc. Then I refer to what you said about this modeldir being the path to the directory containing the files mentioned above, in other words this model dir is the path to the gm1 directory, and then I modified it according to the code you used to generate the molecule, and then I got an error when I ran it from the command line. Can you show me what the problem is? I also asked chatgpt and tried a lot but still got an error! 2023-11-29 20-56-47 的屏幕截图 2023-11-29 21-31-34 的屏幕截图

ldhkc commented 9 months ago

2023-11-29 21-32-40 的屏幕截图

ldhkc commented 9 months ago

Sorry, author, there are a lot of questions, that is, I would like to ask this training weights are also saved to that place ah? And is the main function for training the cli.py py file?

ldhkc commented 9 months ago

Hi author, I want to visualize this generated molecule, but I get an error that it can't be displayed, how can I solve this, I have set the visualization parameter to True. ![Uploading 2023-11-30 16-51-59 的屏幕截图.png…]()

NiklasGebauer commented 9 months ago

Hi author, now I have another problem, that is, I first trained only a few epochs, I did the training on the server, then it also generated those files as you said in the generator section, e.g.: the directory containing the files best_model, cli.log, config. yaml etc. Then I refer to what you said about this modeldir being the path to the directory containing the files mentioned above, in other words this model dir is the path to the gm1 directory, and then I modified it according to the code you used to generate the molecule, and then I got an error when I ran it from the command line. Can you show me what the problem is? I also asked chatgpt and tried a lot but still got an error!

Hi @ldhkc , the problem is that the config file generate_molecules.yaml is not in your configs path (/root/miniconda3/envs/gschnet/my_gschnet_configs). You should copy all the files from gschnet/src/schnetpack_gschnet/configs to your configs path. This should solve the problem.

NiklasGebauer commented 9 months ago

Sorry, author, there are a lot of questions, that is, I would like to ask this training weights are also saved to that place ah? And is the main function for training the cli.py py file?

If you mean the weights of the trained model, yes, they are stored in the file best_model. cli.py contains the loop for molecule generation. For model training, we rely on the routine defined in the schnetpack package (you can find it here).

NiklasGebauer commented 9 months ago

Hi author, I want to visualize this generated molecule, but I get an error that it can't be displayed, how can I solve this, I have set the visualization parameter to True. Uploading 2023-11-30 16-51-59 的屏幕截图.png…

Sorry, your screenshot has not been uploaded. Can you provided it or the error-trace? It is hard to guess what the problem is without more information.

Best regards Niklas

NiklasGebauer commented 9 months ago

One more question. Can I leave you a contact? If I have any questions in the subsequent fiddling around I may have to come and ask you for advice, thank you very much.

Hi @ldhkc ,

Please post your questions as issues here. In this way, my answers can also benefit other people who wonder about similar problems.

Best regards Niklas

ldhkc commented 8 months ago

One more question. Can I leave you a contact? If I have any questions in the subsequent fiddling around I may have to come and ask you for advice, thank you very much.

Hi @ldhkc ,

Please post your questions as issues here. In this way, my answers can also benefit other people who wonder about similar problems.

Best regards Niklas Okay, okay. Thank you very much.

ldhkc commented 8 months ago

Hi author, I want to visualize this generated molecule, but I get an error that it can't be displayed, how can I solve this, I have set the visualization parameter to True. Uploading 2023-11-30 16-51-59 的屏幕截图.png…

Sorry, your screenshot has not been uploaded. Can you provided it or the error-trace? It is hard to guess what the problem is without more information.

Best regards Niklas Thank you very much, I've fixed this latter issue

ldhkc commented 8 months ago

Hi author, now I have another problem, that is, I first trained only a few epochs, I did the training on the server, then it also generated those files as you said in the generator section, e.g.: the directory containing the files best_model, cli.log, config. yaml etc. Then I refer to what you said about this modeldir being the path to the directory containing the files mentioned above, in other words this model dir is the path to the gm1 directory, and then I modified it according to the code you used to generate the molecule, and then I got an error when I ran it from the command line. Can you show me what the problem is? I also asked chatgpt and tried a lot but still got an error!

Hi @ldhkc , the problem is that the config file generate_molecules.yaml is not in your configs path (/root/miniconda3/envs/gschnet/my_gschnet_configs). You should copy all the files from gschnet/src/schnetpack_gschnet/configs to your configs path. This should solve the problem. Yes, I re-referred to your help file later and managed to solve the problem

NiklasGebauer commented 8 months ago

As far as I can see, all issues have been resolved. If this is not correct, feel free to re-open the issue! Best regards Niklas

atomistic-machine-learning / schnetpack-gschnet

Model training #10