cyye001 / Con-CDVAE

MIT License
17 stars 5 forks source link

How to train combinations of multiple property? #1

Open yuuhaixia opened 5 months ago

yuuhaixia commented 5 months ago

Thank you very much for the author's contribution and sharing, I have some questions to ask you about the code. You mentioned in your paper that it is possible to train multiple combinations of property at the same time, how should I change it in the code? What is the difference and role of 'prop' and 'use_prop' in the screenshot below, and does use_prop: 'formation_energy_per_atom' mean to train only this one property? If I want to train the combination of multiple property, should I change it to use_prop: ['formation_energy_per_atom','band_gap']?

截图 2024-06-05 10-17-39

cyye001 commented 5 months ago

Thanks for your interest!

'use_prop' is the parameter used in CDVAE. In our model, we replaced it with 'prop', but I am sorry that I forgot to delete 'use_prop'.

The input of 'prop' is a list in which all properties set will be read. This is because the first run of run.py will store the processed data into train_data.pt/val_data.pt/test_data.pt, and the second run will load the data directly from xxx.pt to save time. So all the properties that might be used are set to prop.

If you want to train a model with both formation_energy_per_atom and bandgap as conditions, you can try running with the following command:

python concdvae/run.py train=new data=mptest expname=test model=vae_mp_format_gap

If you want more flexibility in setting the properties you use, you can refer to the xxx.yaml file that I preset in conf/model/ , conf/model/conditionmodel/ and conf/model/conditionpre/.

yuuhaixia commented 5 months ago

Thank you very much for your reply! I have seen the preset xxx.yml you are talking about. I have one more question for you. The 'general_full.csv' in the full strategy, what does it represent per line? I see the same number of .pt files generated. Does it represent that each line is a condition, such as formula:MnCuF6 in CS0, to get the corresponding generated xxx_CS0.pt file containing only these chemical elements (Mn, Cu, F). Also I noticed that the /data/mptest/train.csv training set etc. does not have the material MnCuF6. For the 'general_full.csv', what should I change?

截图 2024-06-05 15-41-40

cyye001 commented 5 months ago

Yes, each line is a condition which will be fed into the model when generate new crystal. In fact, not all of the properties in the file will be used, and the model will only use what it needs. Therefore 'general_full.csv' can also be used in default strategy. And such a file name, xxx_default_xxx.pt, generally indicates that it was generated in the default strategy.

In this project, if formula is used, I just embed it to a vector and input it into the model. I did not force the model to generate crystals that only contain these elements. However, this purpose can be achieved simply, such as introducing a suitable mask at generation.

/data/mptest/ is just a small data set for debugging and testing code. When you want to formally train, you should prepare a large enough data set, such as downloading from Materialsproject (https://next-gen.materialsproject.org/), or using the data used in CDVAE (https://github.com/txie-93/cdvae/tree/main/data).

yuuhaixia commented 5 months ago

Thank you again, but when I run this combined attribute training “python concdvae/run.py train=new data=mptest expname=format_gap model=vae_mp_format_gap” , I get the following error causing the training to fail.What should I do to fix it, please? 截图 2024-06-12 14-43-28 截图 2024-06-12 14-50-14

yuuhaixia commented 5 months ago

截图 2024-06-12 15-00-03

cyye001 commented 5 months ago

Thanks for pointing out the bug!

For the first error, the code stop at "assert block_inc == 0 # Implementing this is not worth the effort", which is mean some atoms in noised crystals can't find a neighbor atom within the cutoff. This is caused by inaccurate predictions of lattice constants at the beginning of training. To fix the bug, I changed line 250 of the concdvae/PT_train/training.py from "outputs = model(batch, teacher_forcing=False, training=True)" to "outputs = model(batch, teacher_forcing=True, training=False)".

For the second error, I think it is caused by exploding gradients. You can avoid this by setting gradient clipping:

python concdvae/run.py train=new data=mptest expname=test model=vae_mp_format_gap train.PT_train.clip_grad_norm=0.001

This problem can also be avoided by reducing the learning rate in the conf/optim/default.yaml.

cyye001 commented 5 months ago

For the first two errors, “KeyError: ‘formation_energy_per_atom’" and "KeyError: ‘band_gap'", I think you just used general_full.csv that I uploaded. But this file is missing the properties, ‘formation_energy_per_atom’ and ‘band_gap', required by the model. You can add these two properties to the file yourself, or re-download my updated general_full.csv. d256649f34b5f52669fd58b186028cc

For the last error, I'm sorry I didn't catch this bug before uploading the code, I've updated scripts/evaluate_diff.py to fix it. Thanks for pointing out the bug!

yuuhaixia commented 5 months ago

Thank you for your answer. I have solved the above problem by reducing the learning rate. Also I would like to inquire about what you mentioned earlier, that by introducing a suitable mask at the time of generation we can force the model to generate only crystals with elements containing the specified chemical formula. How can I modify this, can you give me some specific ideas and corresponding code? Thank you very much!

yuuhaixia commented 5 months ago

I really appreciate that you have been replying and solving the problem I was having, I solved the problem by updating the “general_full.csv”. But when I follow the below steps to execute the code, I encountered a new problem “AttributeError: ‘Namespace’ object has no attribute ‘skEMB’ “

  1. python concdvae/run.py train=new data=mptest expname=full_fg model=vae_mp_format_gap accelerator=gpu train.PT_train.clip_grad_norm= 0.0001
  2. python scripts/condition_diff_z.py --model_path /home/ps/yhx_project/Con-CDVAE-main/output/hydra/singlerun/2024-06-14/full_fg -- model_file model_full_fg.pth --fullfea 1 --newcond /home/ps/yhx_project/Con-CDVAE-main/conf/conz_2.yaml --newdata mptest4conz
  3. python scripts/evaluate_diff.py --model_path /home/ps/yhx_project/Con-CDVAE-main/output/hydra/singlerun/2024-06-14/fullfg --model file model_full_fg.pth --conz_file conz_model_ABC_diffu.pth --prop_path general_full.csv

image

cyye001 commented 4 months ago

For the error “AttributeError: ‘Namespace’ object has no attribute ‘skEMB’", I've updated scripts/evaluate_diff.py to fix it. Thanks for pointing out the bug!

For the "suitable mask", the generating process is denoising process. At each denoising step, the model predicts the probability of each atom belongs to different elements, which is recorded in the variable 'pred_atom_types' (you can find it in 'def langevin_dynamics' of /concdvae/pl_modules/model.py). So you can use suitable mask to mask the elements you don't want to generate.

yuuhaixia commented 3 months ago

I'm very sorry to bother you again. When I set 'predict_property: True', I have this problem below. 'property_loss = self.property_loss(z, batch) ' Is this representing the loss of all properties in the condition? What should I do to solve this problem?

image

cyye001 commented 3 months ago

I'm very sorry to bother you again. When I set 'predict_property: True', I have this problem below. 'property_loss = self.property_loss(z, batch) ' Is this representing the loss of all properties in the condition? What should I do to solve this problem?

image

Thanks for point out this bug! I have fixed it with the new 'concdvae/pl_modules/model.py'.

yuuhaixia commented 3 months ago

Thanks, I've got it running successfully according to your update. But I am a bit confused now, when I run 'condition_diff_z.py' and print('datamodule.train_dataloader',batch), I notice that the output mp_id=[10], bandgap= [10], formation=[10] or mp_id=[9], bandgap=[9], formation=[9], e_above_hull=[9] and so on. Why are they all [9] or [10]? Instead of the exact values of the specific properties corresponding to each cif structure. How did you read them and achieve a one-to-one correspondence while encoding them in the network? This has puzzled me for a long time and I've never understood it. @cyye001

image

cyye001 commented 3 months ago

[9] or [10] is the shape of the tensor, if you want to print the value you can try: print(batch.bandgap) If the tensor is too large, you may need to change it to list to print the all value.

All the data in tensor is sorted in the same order, so mp_id[i] corresponds to bandgap[i], alpha[i] ...... And frac_coords and to_jimages can be corresponded to mp_id using n_atom and num_bonds.

Zhoushun2021 commented 2 months ago

很抱歉再次打扰您。当我设置 'predict_property: True' 时,出现以下问题。'property_loss = self.property_loss(z, batch) ' 这是否代表条件中的所有属性都丢失?我应该怎么做才能解决这个问题? 图像

感谢您指出这个错误!我已经用新的“concdvae/pl_modules/model.py”修复了它。

I managed to fix the error using your updated ‘model.py’, but I don't understand what is used to represent and calculate the loss of properties? Is it involved in the training?