Open SolayMane opened 3 years ago
Hello, Thank you for the report. That error was caused because recent Braker introduced a bug. The quick fix would be:
braker_out/PREFIX
directory)gene_filtering
folderI will figure out how FunGAP handles the bug while the Braker developers are working on it. Let me know if you have further questions!
thank you for your reply, here is the output of what I have in culumn 2 of the file : cut -f 2 fungap_out/braker_out/SRR13198488/braker_SRR13198488.gff3 | sort | uniq AUGUSTUS no line of GeneMark.
Hmm. Could you send me Braker GFF3 and genome assembly you used? My e-mail address is here: mbnmbn00@gmail.com
If you cannot share the files, you may send the part of (e.g., first 10 genes) the Braker GFF3 file.
here are the 1st 10 lines of braker_SRR13198488.gff3
Chr_4 AUGUSTUS gene 2037937 2038673 0.96 + . ID=g9539; Chr_4 AUGUSTUS mRNA 2037937 2038673 0.96 + . ID=g9539.t1;Parent=g9539; Chr_4 AUGUSTUS start_codon 2037937 2037939 . + 0 ID=g9539.t1.start1;Parent=g9539.t1; Chr_4 AUGUSTUS CDS 2037937 2037961 0.97 + 0 ID=g9539.t1.CDS1;Parent=g9539.t1; Chr_4 AUGUSTUS exon 2037937 2037961 . + . ID=g9539.t1.exon1;Parent=g9539.t1; Chr_4 AUGUSTUS intron 2037962 2038086 1 + . ID=g9539.t1.intron1;Parent=g9539.t1; Chr_4 AUGUSTUS CDS 2038087 2038673 0.99 + 2 ID=g9539.t1.CDS2;Parent=g9539.t1; Chr_4 AUGUSTUS exon 2038087 2038673 . + . ID=g9539.t1.exon2;Parent=g9539.t1; Chr_4 AUGUSTUS stop_codon 2038671 2038673 . + 0 ID=g9539.t1.stop1;Parent=g9539.t1; Chr_11 AUGUSTUS gene 2926489 2927424 0.99 + . ID=g3985;
Those lines look good. It might be one poor gene coordinates.
seq_cds = []
for feature in mrna_sub_features_s:
if feature.type != 'CDS':
continue
seq_cds.append(rec.seq[feature.location.start:feature.location.end])
gene_seq = reduce(operator.add, seq_cds)
So the program complains that seq_cds
is empty, which is not possible for the normal entries.
I can update the code so that it ignores the empty genes, but I would like to see which entries have such cases.
If you don't mind, could you put the below lines in the catch_bad_genes.py
and run the code for me?
Right above line 118,
if not seq_cds:
print(mrna_feature)
continue
gene_seq = reduce(operator.add, seq_cds) # This is line 118
Your catch_bad_genes.py
command would be:
/workspace/FunGAP/catch_bad_genes.py \
--gff3_files /fungap_workspace/fungap_out/augustus_out/augustus.gff3 \
/fungap_workspace/fungap_out/maker_out/SRR13198488/maker_SRR13198488.gff3 \
/fungap_workspace/fungap_out/braker_out/SRR13198488/braker_SRR13198488.gff3 \
--genome_assembly /fungap_workspace/genome_scaffolds.fasta \
--output_dir /fungap_workspace/fungap_out/gene_filtering
Thank you it works, below the output of the modified scripts : type: start_codon location: 2391384:2391387 id: g15364.t1.start1 qualifiers: Key: ID, Value: ['g15364.t1.start1'] Key: Parent, Value: ['g15364.t1'] Key: phase, Value: ['0'] Key: source, Value: ['AUGUSTUS']
type: CDS location: 2391384:2391551 id: g15364.t1.CDS1 qualifiers: Key: ID, Value: ['g15364.t1.CDS1'] Key: Parent, Value: ['g15364.t1'] Key: phase, Value: ['0'] Key: score, Value: ['0.97'] Key: source, Value: ['AUGUSTUS']
type: exon location: 2391384:2391551 id: g15364.t1.exon1 qualifiers: Key: ID, Value: ['g15364.t1.exon1'] Key: Parent, Value: ['g15364.t1'] Key: source, Value: ['AUGUSTUS']
type: intron location: 2391551:2391604 id: g15364.t1.intron1 qualifiers: Key: ID, Value: ['g15364.t1.intron1'] Key: Parent, Value: ['g15364.t1'] Key: score, Value: ['0.97'] Key: source, Value: ['AUGUSTUS']
type: CDS location: 2391604:2393408 id: g15364.t1.CDS2 qualifiers: Key: ID, Value: ['g15364.t1.CDS2'] Key: Parent, Value: ['g15364.t1'] Key: phase, Value: ['1'] Key: score, Value: ['0.96'] Key: source, Value: ['AUGUSTUS']
type: exon location: 2391604:2393408 id: g15364.t1.exon2 qualifiers: Key: ID, Value: ['g15364.t1.exon2'] Key: Parent, Value: ['g15364.t1'] Key: source, Value: ['AUGUSTUS']
type: stop_codon location: 2393405:2393408 id: g15364.t1.stop1 qualifiers: Key: ID, Value: ['g15364.t1.stop1'] Key: Parent, Value: ['g15364.t1'] Key: phase, Value: ['0'] Key: source, Value: ['AUGUSTUS']
Glad it helped! Though, I still don't get what caused the error. I will keep an eye on it.
If you don't mind, could you click the "star" for this repository?
Even though this issue is closed, I'm facing this very same problem trying to run the test case in the documentation at https://github.com/CompSynBioLab-KoreaUniv/FunGAP/blob/master/INSTALL.md#test-run
Everything works as expected for all those entries with mRNA
as feature type and some list of subfeatures. For instance, the last working one is
FEATURE type: mRNA
location: [20671:20966](-)
id: snap_masked-NC_001224.1-processed-gene-0.29-mRNA-1
qualifiers:
Key: ID, Value: ['snap_masked-NC_001224.1-processed-gene-0.29-mRNA-1']
Key: Name, Value: ['snap_masked-NC_001224.1-processed-gene-0.29-mRNA-1']
Key: Parent, Value: ['snap_masked-NC_001224.1-processed-gene-0.29']
Key: _AED, Value: ['1.00']
Key: _QI, Value: ['0|0|0|0|1|1|2|0|95']
Key: _eAED, Value: ['1.00']
Key: source, Value: ['maker']
SUBFEATURES: [SeqFeature(FeatureLocation(ExactPosition(20960), ExactPosition(20966), strand=-1), type='exon', id='snap_masked-NC_001224.1-processed-gene-0.29-mRNA-1:2'), SeqFeature(FeatureLocation(ExactPosition(20671), ExactPosition(20953), strand=-1), type='exon', id='snap_masked-NC_001224.1-processed-gene-0.29-mRNA-1:1'), SeqFeature(FeatureLocation(ExactPosition(20960), ExactPosition(20966), strand=-1), type='CDS', id='snap_masked-NC_001224.1-processed-gene-0.29-mRNA-1:cds'), SeqFeature(FeatureLocation(ExactPosition(20671), ExactPosition(20953), strand=-1), type='CDS', id='snap_masked-NC_001224.1-processed-gene-0.29-mRNA-1:cds')]
However, at some point, all entries look like the following, without any subfeatures:
FEATURE type: start_codon
location: [186919:186922](+)
id: file_1_file_1_g4549.t1.start1
qualifiers:
Key: ID, Value: ['file_1_file_1_g4549.t1.start1']
Key: Parent, Value: ['file_1_file_1_g4549.t1']
Key: phase, Value: ['0']
Key: source, Value: ['AUGUSTUS']
SUBFEATURE []
The lack of subfeatures is what ultimately causes the empty sequence in seq_cds
.
Are these safe to ignore?
Thanks for reporting this. That seems a problem. While I'm running the pipeline myself, could you more information?
Where do you think the prefix "file_1_file1" come from?
The file "file_1_file_1_g4549" is generated by BRAKER. I'm using version 2.1.6.
It seems that we are hitting the same problem as in https://github.com/Gaius-Augustus/TSEBRA/issues/2
It seems that they fixed it, but conda version was not updated. (And it seems conda installation is no longer updated) I installed 2.1.6 via conda and am testing it now.
If it doesn't work, the options I have are: 1) Add a step to fix the bug in the installation 2) Give up conda and install the Braker manually 3) Force users to use 2.1.5, which works fine
I think I'm going to go with option 3) because it's the simplest. So, could you switch into 2.1.5 and try again?
If you really want to use 2.1.6, then you may update the code as shown here: https://github.com/Gaius-Augustus/BRAKER/commit/f38630c1cad3e11b525f84d517c7949cb4c2d7eb
Let me know what you think!
Thanks for the feedback. I confirm that BRAKER v2.1.6 with patch https://github.com/Gaius-Augustus/BRAKER/commit/f38630c1cad3e11b525f84d517c7949cb4c2d7eb does work. No issues. :+1:
Was there any easy way to apply the patch? Did you do it manually?
I applied the patch manually to the sources of v2.1.6. We use EasyBuild to install software in our HPC cluster, which makes it easy to apply such patches. I also opened a PR to add that patch to the official recipe for BRAKER v2.1.6 in EasyBuild (see https://github.com/easybuilders/easybuild-easyconfigs/pull/17631)
On your side, I suggest to increase the minimum version of BRAKER for FunGAP to v3.0.2, which is the lowest version with this fix.
I run fungap on Docker and I get this error :
[Wrapper] /workspace/FunGAP/catch_bad_genes.py --gff3_files /fungap_workspace/fungap_out/augustus_out/augustus.gff3 /fungap_workspace/fungap_out/maker_out/SRR13198488/maker_SRR13198488.gff3 /fungap_workspace/fungap_out/braker_out/SRR13198488/braker_SRR13198488.gff3 --genome_assembly /fungap_workspace/genome_scaffolds.fasta --output_dir /fungap_workspace/fungap_out/gene_filtering Traceback (most recent call last): File "/workspace/FunGAP/catch_bad_genes.py", line 179, in
main()
File "/workspace/FunGAP/catch_bad_genes.py", line 58, in main
catch_middle_stop(gff3_files, genome_assembly_file, output_dir)
File "/workspace/FunGAP/catch_bad_genes.py", line 118, in catch_middle_stop
gene_seq = reduce(operator.add, seq_cds)
TypeError: reduce() of empty sequence with no initial value
Traceback (most recent call last):
File "/workspace/FunGAP/fungap.py", line 826, in
main()
File "/workspace/FunGAP/fungap.py", line 258, in main
gff3_files, genome_assembly, output_dir, d_path, logger
File "/workspace/FunGAP/fungap.py", line 732, in catch_bad_genes
check_call(command_args)
File "/opt/conda/lib/python3.7/subprocess.py", line 363, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/workspace/FunGAP/catch_bad_genes.py', '--gff3_files', '/fungap_workspace/fungap_out/augustus_out/augustus.gff3', '/fungap_workspace/fungap_out/maker_out/SRR13198488/maker_SRR13198488.gff3', '/fungap_workspace/fungap_out/braker_out/SRR13198488/braker_SRR13198488.gff3', '--genome_assembly', '/fungap_workspace/genome_scaffolds.fasta', '--output_dir', '/fungap_workspace/fungap_out/gene_filtering']' returned non-zero exit status 1.