training result - Githubissues

aqlaboratory / openfold

Trainable, memory-efficient, and GPU-friendly PyTorch reproduction of AlphaFold 2

Apache License 2.0

2.61k stars 478 forks source link

training result #108

Open liuxm117 opened 2 years ago

liuxm117 commented 2 years ago

hi, I trained openfold and use the ckpt to test sequence, but it was not correct, lots of atoms overlapped together ![Uploading 微信图片_20220525170716.jpg…]()

liuxm117 commented 2 years ago

微信图片_20220525170716

gahdritz commented 2 years ago

It's difficult to say what might be happening here. OpenFold is designed for proteins, so training it on small molecules like this one requires extensive modifications, I imagine. All I can say is that our own trained version of OpenFold has matched AlphaFold's performance on proteins.

liuxm117 commented 2 years ago

Congratulations， hope to see your trained weights soon, 1、what precision are you using，use fp16, loss is always nan 2、In the later stage of training, the loss has been oscillating and no longer decreasing，can you give me some advice，which part or which parameter should I pay attention to

gahdritz commented 2 years ago

Yes, fp16 is unstable to the point where I wouldn't currently call it functional. CASP + our own training (in bfloat16) is currently eating up most of my development time, but this remains one of the top features I'd like to get working.
It's almost impossible to say without knowing what data you're using, what your task is, what modifications you've made, etc. In any case, training is expensive enough that I haven't been able to develop much intuition for it---all I know for certain is that DeepMind's presets work pretty well for the protein folding task. Anything beyond that is pretty much terra incognita for me too, I'm afraid.

liuxm117 commented 2 years ago

thank you for your reply，I use pdb_mmcif as train dataset, I almost didn't modify the code, just did some memory optimization to speed up the performance，maybe it's because I don't have enough gpus

gahdritz commented 2 years ago

Oh are you not doing some kind of small molecule conformation prediction? I assumed that you're not doing the standard protein folding task from that image you sent. If you are just doing normal protein folding, I might be able to help you. How many GPUs are you using?

liuxm117 commented 2 years ago

I use pymol to visualization predicted pdb，but it doesn't show anything ,this image is visualization by another tool, I used 24GPUS, Almost 100w samples were trained

gahdritz commented 2 years ago

How are the metrics looking? What is your current LDDT-Ca?

liuxm117 commented 2 years ago

gahdritz commented 2 years ago

Here's our validation LDDT curve. The x axis records batches of size 132, so the plateau at 0.8 LDDT is reached after the model's seen approx. 660k proteins. This behavior has been extremely consistent across our training runs. Perhaps you just need to train longer?

liuxm117 commented 2 years ago

due to the limitation of the number of GPUs， I cannot increase the length of batch_size，Could this be the problem? Where did your validation set come from?(cameo? or just a part of trainset)

gahdritz commented 2 years ago

Yes CAMEO.

vetmax7 commented 4 months ago

Hello @liuxm117 and all!

Can you explain please, how did you use your *.ckpt files for for prediction? Which script did you use for test with ckpt files? In Readme is mentioned only about *.pt.

vetmax7 commented 3 months ago

@gahdritz @liuxm117 Can you explain please, how did you use your .ckpt files for for prediction as a model? Or how can I convert it to .npz? If I try to convert my ckpt I've got a lot of errors about different structure of my ckpt.