Closed jingydz closed 5 years ago
My best guess as to why you got the error is that TRINITY_DN17801_c2_g3_i1.p2
is using a DNA alphabet instead of an amino acid alphabet (or at least, this is what BioPerl detected). OrthoMCL requires protein sequences as a string of amino acids. I cannot really give much more information than this without seeing that particular sequence.
Terribly sorry.I forgot to upload the error sequence. 101995 >TRINITY_DN17801_c2_g3_i1.p2 type:internal len:105 gc:universal TRINITY_DN17801_c2_g3_i1:3-314(+) 101996 GCGYYSGGSGGGSSCGGGSSGGGSSCGGGGGGSYGGGSSCGGGGGSGGGVKYSGGGGSSCGGGYSGGGGSSCGGGYSGGGGGSSCGGGSSGGGSSCGGGGGSGG There are two sets of protein sequences have this problem, this is someone sent me the test file, let me help her test, so I do not know whether the file itself error. To keep the program running properly, I deleted these two lines. Is there a problem?
Okay, I think I have an idea of what's going on. The BioPerl automatic detection of the file type is confused because the sequence data in that record could either be dna
or protein
.
I've added a small fix for this in https://github.com/apetkau/orthomcl-pipeline/pull/30. I am wondering if you can test this out to make sure it fixes your problem? The new code should be in branch fix-invalid-alphabet
.
Hi, Iām so sorry for my delays in replying to your letter, because my teacher's server made some mistakes and I was busy in my study a few weeks ago.
Thank you very much again, it fixed my problem.
ššš
But I have another problem, someone asked me to run a set of data for her, but it went wrong in step 9.
The error message is as follows:
Stage 8 took 3532.28 minutes done
=Stage 9: Parse Blast Results= cat /data/users/zhangjingjing/OrthoMCL/orthomcl-pipeline/20190423_out/blast_results/blast_results.* > /data/users/zhangjingjing/OrthoMCL/orthomcl-pipeline/20190423_out/blast_load/all.fasta
/data/users/zhangjingjing/OrthoMCL/orthomclSoftware-v2.0.9/bin/orthomclBlastParser "/data/users/zhangjingjing/OrthoMCL/orthomcl-pipeline/20190423_out/blast_load/all.fasta" "/data/users/zhangjingjing/OrthoMCL/orthomcl-pipeline/20190423_out/compliant_fasta" 1>/data/users/zhangjingjing/OrthoMCL/orthomcl-pipeline/20190423_out/blast_load/similarSequences.txt 2>/data/users/zhangjingjing/OrthoMCL/orthomcl-pipeline/20190423_out/log/9.parseBlast.log Error executing command: /data/users/zhangjingjing/OrthoMCL/orthomclSoftware-v2.0.9/bin/orthomclBlastParser "/data/users/zhangjingjing/OrthoMCL/orthomcl-pipeline/20190423_out/blast_load/all.fasta" "/data/users/zhangjingjing/OrthoMCL/orthomcl-pipeline/20190423_out/compliant_fasta" 1>/data/users/zhangjingjing/OrthoMCL/orthomcl-pipeline/20190423_out/blast_load/similarSequences.txt 2>/data/users/zhangjingjing/OrthoMCL/orthomcl-pipeline/20190423_out/log/9.parseBlast.log. See logs /data/users/zhangjingjing/OrthoMCL/orthomcl-pipeline/20190423_out/blast_load/similarSequences.txt and /data/users/zhangjingjing/OrthoMCL/orthomcl-pipeline/20190423_out/log/9.parseBlast.log
And I checked the error log:
[root@GenEngine 20190423_out]# cat /data/users/zhangjingjing/OrthoMCL/orthomcl-pipeline/20190423_out/log/9.parseBlast.log
acquiring genes from b.fasta
acquiring genes from f.fasta
acquiring genes from m.fasta
acquiring genes from musfinalpep.fasta
couldn't find taxon for gene 'musfinalpep|ENSMU' at /data/users/zhangjingjing/OrthoMCL/orthomclSoftware-v2.0.9/bin/orthomclBlastParser line 105,
Sorry to bother you again, but I am only a sophomore who has just come to study bioinformatics for a few months, so I don't have much knowledge reserve. Can you help me with this problem? What's more, can I only run from step 1 again? Because it takes so long, can I just run it from step 9? I will be very appreciated if you could reply to me.
No problem.
What does the file musfinalpep.fasta
look like? It may be the cast that the fasta sequence entries in this file are not formatted correctly for OrthoMCL. If it's possible, could you send me the file (you can email it to me if you wish).
And no, there is no way to run just from step 9.
=Stage 1: Validate Files = Validating mfilter.fasta ... 47599 sequences Error: file /data/users/zhangjingjing/OrthoMCL/orthomcl-pipeline/ceshi2_in/bfilter.fasta contains a sequence (TRINITY_DN17801_c2_g3_i1.p2) containing non-protein alphabet (dna) at /data/users/zhangjingjing/OrthoMCL/orthomcl-pipeline/bin/../scripts/orthomcl-pipeline.pl line 357, line 50938.
Validating bfilter.fasta ...
The above is my running process reported wrong.
I looked at the sequence and didn't see anything wrong, but just delete the sequence and I can run my file completely.
Although I got the output file, now I'd like to ask why stage1 reported the error?