Closed agatawitkowska closed 1 year ago
can you share with me your CSV file?
I pasted content of PDBID_6IWD.csv
above (that's one I tried). Now also as a file: PDBID_6IWD.csv.
its work for me
Sure. I do not expect problems with my PDBID_6IWD.csv
or run command. Thanks anyway.
I guess there might be some "install"/dependencies problem which I'm not sure how to start debugging right now.
@biotech70, did you run it on Windows 10 using WSL Ubuntu 20.04.4 LTS (GNU/Linux 5.10.102.1-microsoft-standard-WSL2 x86_64)?
no
Dear all,
I've been using ColabFold for thousands of pairs of sequences and for the first time I have the same problem as the author of this issue.
This is my prompt: colabfold_batch --num-recycle 3 --templates --pair-mode unpaired+paired --model-type AlphaFold2-multimer-v2 --rank intscore $SEQUENCE_FILE$PAIR_TO_TEST $COLABFOLD_OUTPUT$PAIR_TO_TEST
You will find below the fasta sequence of the pair of proteins:
P43132 MKLGIIPYQEGTDIVYKNALQGQQEGKRPNLPQMEATHQIKSSVQGTSYEFVRTEDIPLNRRHFVYRPCS ANPFFTILGYGCTEYPFDHSGMSVMDRSEGLSISRDGNDLVSVPDQYGWRTARSDVCIKEGMTYWEVEVI RGGNKKFADGVNNKENADDSVDEVQSGIYEKMHKQVNDTPHLRFGVCRREASLEAPVGFDVYGYGIRDIS LESIHEGKLNCVLENGSPLKEGDKIGFLLSLPSIHTQIKQAKEFTKRRIFALNSHMDTMNEPWREDAENG PSRKKLKQETTNKEFQRALLEDIEYNDVVRDQIAIRYKNQLFFEATDYVKTTKPEYYSSDKRERQDYYQL EDSYLAIFQNGKYLGKAFENLKPLLPPFSELQYNEKFYLGYWQHGEARDESNDKNTTSAKKKKQQQKKKK GLILRNKYVNNNKLGYYPTISCFNGGTARIISEEDKLEYLDQIRSAYCVDGNSKVNTLDTLYKEQIAEDI VWDIIDELEQIALQQ
P39706 MNILLQDPFAVLKEHPEKLTHTIENPLRTECLQFSPCGDYLALGCANGALVIYDMDTFRPICVPGNMLGA HVRPITSIAWSPDGRLLLTSSRDWSIKLWDLSKPSKPLKEIRFDSPIWGCQWLDAKRRLCVATIFEESDA YVIDFSNDPVASLLSKSDEKQLSSTPDHGYVLVCTVHTKHPNIIIVGTSKGWLDFYKFHSLYQTECIHSL KITSSNIKHLIVSQNGERLAINCSDRTIRQYEISIDDENSAVELTLEHKYQDVINKLQWNCILFSNNTAE YLVASTHGSSAHELYIWETTSGTLVRVLEGAEEELIDINWDFYSMSIVSNGFESGNVYVWSVVIPPKWSA LAPDFEEVEENVDYLEKEDEFDEVDEAEQQQGLEQEEEIAIDLRTREQYDVRGNNLLVERFTIPTDYTRI IKMQSS
And you will find below the output file of my job in the SLURM cluster.
The following have been reloaded with a version change: 1) modenv/scs5 => modenv/hiera
Module GCC/10.2.0CUDA/11.3.1OpenMPI/4.0.5 and 12 dependencies loaded. sbatch run_colabfold_AF2Mv2_unpaired_paired_net_perturbed.sh 1 Submitted batch job 29707500 WARNING: You are welcome to use the default MSA server, however keep in mind that it's a limited shared resource only capable of processing a few thousand MSAs per day. Please submit jobs only from a single IP address. We reserve the right to limit access to the server case-by-case when usage exceeds fair use.
If you require more MSAs:
You can precompute all MSAs with colabfold_search
or
You can host your own API and pass it to --host-url
2022-10-06 14:20:26,288 Running colabfold 1.3.0 (7ebcbe62e8d88400b0e75aa0878dce2ff3a6c71f)
2022-10-06 14:20:30,710 --max-msa can not be used in combination with AlphaFold2-multimer (--max-msa ignored)
2022-10-06 14:20:42,329 Found 7 citations for tools or databases
2022-10-06 14:20:47,003 Query 1/1: P43132_P39706 (length 931)
0%| | 0/300 [elapsed: 00:00 remaining: ?] SUBMIT: 0%| | 0/300 [elapsed: 00:00 remaining: ?] COMPLETE: 0%| | 0/300 [elapsed: 00:00 remaining: ?] COMPLETE: 100%|██████████| 300/300 [elapsed: 00:00 remaining: 00:00] COMPLETE: 100%|██████████| 300/300 [elapsed: 00:03 remaining: 00:00] 2022-10-06 14:21:19,372 Could not get MSA/templates for P43132_P39706: HHSearch failed: stdout:
stderr:
14:21:17.044 INFO: /tmp/tmp7fatjs/query.a3m is in A2M, A3M or FASTA format
14:21:17.111 INFO: Searching 20 database HHMs without prefiltering
14:21:17.111 INFO: Iteration 1
14:21:17.407 WARNING: database contains sequences that exceed maximum allowed size (maxres = 20001). Max sequence length can be increased with parameter -maxres.
14:21:17.426 INFO: Scoring 20 HMMs using HMM-HMM Viterbi alignment
14:21:17.452 INFO: Alternative alignment: 0
14:21:19.050 WARNING: ignoring invalid symbol '4' at pos. 2955 in line 4 of 6WOV_C
14:21:19.050 WARNING: ignoring invalid symbol '4' at pos. 2956 in line 4 of 6WOV_C
14:21:19.050 WARNING: ignoring invalid symbol '2' at pos. 2957 in line 4 of 6WOV_C
14:21:19.050 WARNING: ignoring invalid symbol '1' at pos. 2958 in line 4 of 6WOV_C
14:21:19.050 WARNING: ignoring invalid symbol '4' at pos. 2959 in line 4 of 6WOV_C
I purposefully truncated the rest of the output because it goes on the same over thousands lines. Thank you in advance for your help.
Sincerely, Ilyes
please share the provided seq file that was used for structure prediction.
You will find the seq file in the attached document.
Sincerely, Ilyes
Le jeu. 6 oct. 2022 à 16:12, mohammad mahmoudi gomari < @.***> a écrit :
please share the provided seq file that was used for structure prediction.
— Reply to this email directly, view it on GitHub https://github.com/YoshitakaMo/localcolabfold/issues/107#issuecomment-1270126498, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEKMR56BQULUTL6HMNLF42DWB3M3TANCNFSM57ASAVDA . You are receiving this because you commented.Message ID: @.***>
for multimer prediction, you must separate desired sequences by using the : symbol in a CSV file.
like below: MKLGIIPYQEGTDIVYKNALQGQQEGKRPNLPQMEATHQIKSSVQGTSYEFVRTEDIPLNRRHFVYRPCS ANPFFTILGYGCTEYPFDHSGMSVMDRSEGLSISRDGNDLVSVPDQYGWRTARSDVCIKEGMTYWEVEVI RGGNKKFADGVNNKENADDSVDEVQSGIYEKMHKQVNDTPHLRFGVCRREASLEAPVGFDVYGYGIRDIS LESIHEGKLNCVLENGSPLKEGDKIGFLLSLPSIHTQIKQAKEFTKRRIFALNSHMDTMNEPWREDAENG PSRKKLKQETTNKEFQRALLEDIEYNDVVRDQIAIRYKNQLFFEATDYVKTTKPEYYSSDKRERQDYYQL EDSYLAIFQNGKYLGKAFENLKPLLPPFSELQYNEKFYLGYWQHGEARDESNDKNTTSAKKKKQQQKKKK GLILRNKYVNNNKLGYYPTISCFNGGTARIISEEDKLEYLDQIRSAYCVDGNSKVNTLDTLYKEQIAEDI VWDIIDELEQIALQQ:MNILLQDPFAVLKEHPEKLTHTIENPLRTECLQFSPCGDYLALGCANGALVIYDMDTFRPICVPGNMLGA HVRPITSIAWSPDGRLLLTSSRDWSIKLWDLSKPSKPLKEIRFDSPIWGCQWLDAKRRLCVATIFEESDA YVIDFSNDPVASLLSKSDEKQLSSTPDHGYVLVCTVHTKHPNIIIVGTSKGWLDFYKFHSLYQTECIHSL KITSSNIKHLIVSQNGERLAINCSDRTIRQYEISIDDENSAVELTLEHKYQDVINKLQWNCILFSNNTAE YLVASTHGSSAHELYIWETTSGTLVRVLEGAEEELIDINWDFYSMSIVSNGFESGNVYVWSVVIPPKWSA LAPDFEEVEENVDYLEKEDEFDEVDEAEQQQGLEQEEEIAIDLRTREQYDVRGNNLLVERFTIPTDYTRI IKMQSS
I forgot to mention that I modified the code internally to get the same required format as you showed above.
Sincerely, Ilyes
Le jeu. 6 oct. 2022 à 16:32, ilyes abdelhamid @.***> a écrit :
I've been computing multimer prediction with the format I sent you in the fasta file.
Sincerely, Ilyes
Le jeu. 6 oct. 2022 à 16:23, mohammad mahmoudi gomari < @.***> a écrit :
like below: MKLGIIPYQEGTDIVYKNALQGQQEGKRPNLPQMEATHQIKSSVQGTSYEFVRTEDIPLNRRHFVYRPCS ANPFFTILGYGCTEYPFDHSGMSVMDRSEGLSISRDGNDLVSVPDQYGWRTARSDVCIKEGMTYWEVEVI RGGNKKFADGVNNKENADDSVDEVQSGIYEKMHKQVNDTPHLRFGVCRREASLEAPVGFDVYGYGIRDIS LESIHEGKLNCVLENGSPLKEGDKIGFLLSLPSIHTQIKQAKEFTKRRIFALNSHMDTMNEPWREDAENG PSRKKLKQETTNKEFQRALLEDIEYNDVVRDQIAIRYKNQLFFEATDYVKTTKPEYYSSDKRERQDYYQL EDSYLAIFQNGKYLGKAFENLKPLLPPFSELQYNEKFYLGYWQHGEARDESNDKNTTSAKKKKQQQKKKK GLILRNKYVNNNKLGYYPTISCFNGGTARIISEEDKLEYLDQIRSAYCVDGNSKVNTLDTLYKEQIAEDI
VWDIIDELEQIALQQ:MNILLQDPFAVLKEHPEKLTHTIENPLRTECLQFSPCGDYLALGCANGALVIYDMDTFRPICVPGNMLGA HVRPITSIAWSPDGRLLLTSSRDWSIKLWDLSKPSKPLKEIRFDSPIWGCQWLDAKRRLCVATIFEESDA YVIDFSNDPVASLLSKSDEKQLSSTPDHGYVLVCTVHTKHPNIIIVGTSKGWLDFYKFHSLYQTECIHSL KITSSNIKHLIVSQNGERLAINCSDRTIRQYEISIDDENSAVELTLEHKYQDVINKLQWNCILFSNNTAE YLVASTHGSSAHELYIWETTSGTLVRVLEGAEEELIDINWDFYSMSIVSNGFESGNVYVWSVVIPPKWSA LAPDFEEVEENVDYLEKEDEFDEVDEAEQQQGLEQEEEIAIDLRTREQYDVRGNNLLVERFTIPTDYTRI IKMQSS
— Reply to this email directly, view it on GitHub https://github.com/YoshitakaMo/localcolabfold/issues/107#issuecomment-1270148430, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEKMR52IUAXXRVZO5LBYWPLWB3OHXANCNFSM57ASAVDA . You are receiving this because you commented.Message ID: @.***>
I don't understand, do you get an error when you use a CSV file where the sequences are separated by : symbol?
please use from attached file and below command and let me know the output: colabfold_batch --num-recycle 3 --templates --model-type AlphaFold2-multimer-v2 --rank intscore 6WOV.csv outputdir/ 6WOV.csv
please provide information on your OS and GPU
You will find in the attached document the output of ColabFold. It is the same problem as the one I initially reported: Could not get MSA/templates for PDBID_6WOV: HHSearch failed
I use a multi-GPU sub-cluster "Alpha Centauri".. It has 34 nodes, each with:
8 x NVIDIA A100-SXM4 (40 GB RAM)
2 x AMD EPYC CPU 7352 (24 cores) @ 2.3 GHz with multi-threading enabled
1 TB RAM
3.5 TB /tmp local NVMe device
Sincerely,
Ilyes
Le jeu. 6 oct. 2022 à 17:10, mohammad mahmoudi gomari < @.***> a écrit :
please provide information on your OS and GPU
— Reply to this email directly, view it on GitHub https://github.com/YoshitakaMo/localcolabfold/issues/107#issuecomment-1270235420, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEKMR54HYKINIW3HY3MYOCDWB3TUVANCNFSM57ASAVDA . You are receiving this because you commented.Message ID: @.***>
I've been computing multimer prediction with the format I sent you in the fasta file.
Sincerely, Ilyes
Le jeu. 6 oct. 2022 à 16:23, mohammad mahmoudi gomari < @.***> a écrit :
like below: MKLGIIPYQEGTDIVYKNALQGQQEGKRPNLPQMEATHQIKSSVQGTSYEFVRTEDIPLNRRHFVYRPCS ANPFFTILGYGCTEYPFDHSGMSVMDRSEGLSISRDGNDLVSVPDQYGWRTARSDVCIKEGMTYWEVEVI RGGNKKFADGVNNKENADDSVDEVQSGIYEKMHKQVNDTPHLRFGVCRREASLEAPVGFDVYGYGIRDIS LESIHEGKLNCVLENGSPLKEGDKIGFLLSLPSIHTQIKQAKEFTKRRIFALNSHMDTMNEPWREDAENG PSRKKLKQETTNKEFQRALLEDIEYNDVVRDQIAIRYKNQLFFEATDYVKTTKPEYYSSDKRERQDYYQL EDSYLAIFQNGKYLGKAFENLKPLLPPFSELQYNEKFYLGYWQHGEARDESNDKNTTSAKKKKQQQKKKK GLILRNKYVNNNKLGYYPTISCFNGGTARIISEEDKLEYLDQIRSAYCVDGNSKVNTLDTLYKEQIAEDI
VWDIIDELEQIALQQ:MNILLQDPFAVLKEHPEKLTHTIENPLRTECLQFSPCGDYLALGCANGALVIYDMDTFRPICVPGNMLGA HVRPITSIAWSPDGRLLLTSSRDWSIKLWDLSKPSKPLKEIRFDSPIWGCQWLDAKRRLCVATIFEESDA YVIDFSNDPVASLLSKSDEKQLSSTPDHGYVLVCTVHTKHPNIIIVGTSKGWLDFYKFHSLYQTECIHSL KITSSNIKHLIVSQNGERLAINCSDRTIRQYEISIDDENSAVELTLEHKYQDVINKLQWNCILFSNNTAE YLVASTHGSSAHELYIWETTSGTLVRVLEGAEEELIDINWDFYSMSIVSNGFESGNVYVWSVVIPPKWSA LAPDFEEVEENVDYLEKEDEFDEVDEAEQQQGLEQEEEIAIDLRTREQYDVRGNNLLVERFTIPTDYTRI IKMQSS
— Reply to this email directly, view it on GitHub https://github.com/YoshitakaMo/localcolabfold/issues/107#issuecomment-1270148430, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEKMR52IUAXXRVZO5LBYWPLWB3OHXANCNFSM57ASAVDA . You are receiving this because you commented.Message ID: @.***>
oh so good
Do you have any idea what could be the issue ?
Sincerely, Ilyes
Le mar. 11 oct. 2022 à 10:13, mohammad mahmoudi gomari < @.***> a écrit :
oh so good
— Reply to this email directly, view it on GitHub https://github.com/YoshitakaMo/localcolabfold/issues/107#issuecomment-1274279876, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEKMR5YYF7AOGZKDVOEXF4TWCUOTFANCNFSM57ASAVDA . You are receiving this because you commented.Message ID: @.***>
It may be related to the headers of your sequences.
The headers ? From 6WOV.csv https://github.com/YoshitakaMo/localcolabfold/files/9725935/6WOV.csv ? Could you clarify please ?
Sincerely, Ilyes
Le mar. 11 oct. 2022 à 13:00, mohammad mahmoudi gomari < @.***> a écrit :
It may be related to the headers of your sequences.
— Reply to this email directly, view it on GitHub https://github.com/YoshitakaMo/localcolabfold/issues/107#issuecomment-1274505893, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEKMR575HTR5EJY7QBFVAZ3WCVCGBANCNFSM57ASAVDA . You are receiving this because you commented.Message ID: @.***>
Did you use the CSV file which I shared with you to determine the structure?
Yes. Please see the output in the attached document.
Sincerely, Ilyes
Le mar. 11 oct. 2022 à 13:10, mohammad mahmoudi gomari < @.***> a écrit :
Did you use the CSV file which I shared with you to determine the structure?
— Reply to this email directly, view it on GitHub https://github.com/YoshitakaMo/localcolabfold/issues/107#issuecomment-1274517056, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEKMR57NFMBI2XOIE6HNG3LWCVDMFANCNFSM57ASAVDA . You are receiving this because you commented.Message ID: @.***>
Ok, the most crucial option in prediction of structures by a CSV file is to use : symbol for the separation of sequences in multimeric form.
Yes the ':' is in the CSV file you sent to me. Please if you have any idea what could be the issue let me know. Thank you!
Sincerely, Ilyes
Le mar. 11 oct. 2022 à 13:27, mohammad mahmoudi gomari < @.***> a écrit :
Ok, the most crucial option in prediction of structures by a CSV file is to use : symbol for the separation of sequences in multimeric form.
— Reply to this email directly, view it on GitHub https://github.com/YoshitakaMo/localcolabfold/issues/107#issuecomment-1274534079, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEKMR54JQQ5YKOT7NYQMTJTWCVFKPANCNFSM57ASAVDA . You are receiving this because you commented.Message ID: @.***>
I listed below some things that are important in prediction using local-colabfold: 1- JAX version 2- GPU config 3- The size of the target sequence 4- The version of local-colabfold (because some people get different output from the old version of colabfold compared to the newer version of colabfold)
Finally, we should thank Dr. Yoshitaka Moriwaki, who made it possible to run colabfold on local machines.
Thank you, see reply below.
Sincerely, Ilyes
---------- Forwarded message --------- De : mohammad mahmoudi gomari @.> Date: mar. 11 oct. 2022 à 13:52 Subject: Re: [YoshitakaMo/localcolabfold] Could not get MSA/templates (HHSearch failed) (Issue #107) To: YoshitakaMo/localcolabfold @.> Cc: IlyesAbdelhamid @.>, Comment < @.>
I listed below some things that are important in prediction using local-colabfold: 1- JAX version 2- GPU config
REPLY: I hardly believe that the issue is due to JAX and GPU config because I've been predicting thousands of structures no problem. It's the only time I have such an issue.
3- The size of the target sequence
REPLY: Protein A is 505aa long and Protein B is 426aa long. I have dealt with pairs of proteins with much longer sequences with no problem.
4- The version of local-colabfold (because some people get different output from the old version of colabfold compared to the newer version of colabfold)
REPLY: I would say it is the most plausible option. I use colabfold 1.3.0. Have you tried to run my prediction with my parameters on your own local-colabfold version as a sanity check ? Command line: colabfold_batch --num-recycle 3 --templates --model-type AlphaFold2-multimer-v2 --rank intscore 6WOV.csv outputdir/
Le mar. 11 oct. 2022 à 13:55, mohammad mahmoudi gomari < @.***> a écrit :
Finally, we should thank Dr. Yoshitaka Moriwaki, who made it possible to run colabfold on local machines.
— Reply to this email directly, view it on GitHub https://github.com/YoshitakaMo/localcolabfold/issues/107#issuecomment-1274565251, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEKMR52MAYLZEVUCUQWHBLTWCVIUXANCNFSM57ASAVDA . You are receiving this because you commented.Message ID: @.***>
Actually, the original problem issued by @agatawitkowska was never solved and got no attention here.
On Windows 10 using WSL I'm trying to run:
using
PDBID_6IWD.csv
:I'm getting:
I would appreciate any helpful advice.