Immunotools / IgDetective

a tool for annotation of immunoglobulin genes in genome assemblies
GNU General Public License v3.0
10 stars 1 forks source link

Problem with codon #8

Open pdoris opened 1 year ago

pdoris commented 1 year ago

Much of the pipeline proceeds until seq is translating presumably detected Ig sequences. It repeatedly terminates with the "Codon 'GET' is invalid" error

pdoris$ python run_iterative_igdetective.py /Volumes/ms_imm_doris/Rat_references/BN-HiFi/Final_curated_BN-HiFi_assembly/BN_final.curated_primary.no_mt.unscrubbed.fa /Users/pdoris/IgDetective-1.1.0/BN /usr/local/bin/minimap2 ==== Aligning human IG genes... Aligning IGLV genes (datafiles/human_reference_genes/IGLV.fa)... Aligning IGLJ genes (datafiles/human_reference_genes/IGLJ.fa)... Aligning IGKV genes (datafiles/human_reference_genes/IGKV.fa)... Aligning IGHJ genes (datafiles/human_reference_genes/IGHJ.fa)... Aligning IGHV genes (datafiles/human_reference_genes/IGHV.fa)... Aligning IGKJ genes (datafiles/human_reference_genes/IGKJ.fa)... Aligning IGHC genes (datafiles/human_reference_genes/IGHC.fa)... Aligning IGKC genes (datafiles/human_reference_genes/IGKC.fa)... Aligning IGLC genes (datafiles/human_reference_genes/IGLC.fa)... ==== Identifying IG contigs... ==== Running RSS-based IgDetective for IGH... Contig: CHR_6, contig range: (137515617, 147156653), approx locus length: 9641036 Running: python py/IGDetective.py -i /Users/pdoris/IgDetective-1.1.0/BN/denovo_search/combined_contigs_IGH.fasta -o /Users/pdoris/IgDetective-1.1.0/BN/denovo_search/predicted_genes_IGH -m 1 -l IGH ==== Running RSS-based IgDetective for IGK... Contig: CHR_4, contig range: (97247562, 104902484), approx locus length: 7654922 Running: python py/IGDetective.py -i /Users/pdoris/IgDetective-1.1.0/BN/denovo_search/combined_contigs_IGK.fasta -o /Users/pdoris/IgDetective-1.1.0/BN/denovo_search/predicted_genes_IGK -m 1 -l IGK ==== Running RSS-based IgDetective for IGL... ==== Iterative processing IGHV genes... Running minimap... Alignment of IG genes datafiles/combined_reference_genes/IGHV.fa to /Volumes/ms_imm_doris/Rat_references/BN-HiFi/Final_curated_BN-HiFi_assembly/BN_final.curated_primary.no_mt.unscrubbed.fa Processing SAM file... /Users/pdoris/opt/anaconda3/envs/igdetective/lib/python3.11/site-packages/Bio/Seq.py:2804: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future. warnings.warn( Traceback (most recent call last): File "/Users/pdoris/IgDetective-1.1.0/run_iterative_igdetective.py", line 290, in main(genome_fasta, output_dir, ig_gene_dir) File "/Users/pdoris/IgDetective-1.1.0/run_iterative_igdetective.py", line 259, in main AlignGenesIteratively(ref_gene_fasta, igdetective_tsv, genome_fasta, iter_dir, gene) File "/Users/pdoris/IgDetective-1.1.0/run_iterative_igdetective.py", line 134, in AlignGenesIteratively gene_finding_tools.main(genome_fasta, ref_gene_fasta, iter0_dir) File "/Users/pdoris/IgDetective-1.1.0/py/extract_aligned_genes.py", line 147, in main aa_seq = str(Seq(alignment.gene_seq).translate()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/pdoris/opt/anaconda3/envs/igdetective/lib/python3.11/site-packages/Bio/Seq.py", line 1448, in translate _translate_str(str(self), table, stop_symbol, to_stop, cds, gap=gap) File "/Users/pdoris/opt/anaconda3/envs/igdetective/lib/python3.11/site-packages/Bio/Seq.py", line 2836, in _translate_str raise CodonTable.TranslationError( Bio.Data.CodonTable.TranslationError: Codon 'GET' is invalid (igdetective) IMM-MAC-184391:IgDetective-1.1.0 pdoris$

yana-safonova commented 1 year ago

Hi,

Thank you for reporting the issue! Have you been running IgDetective on Mac OS? I was able to reproduce the problem on my laptop. It looks like the "Alignment" class from BioPython package has a different structure as compared to what we expect from it.

It is either caused by differences between BioPython versions or the way it works on Linux and Mac OS systems.

I will work on unifying the versions, in the meantime please feel free to send me genomes you'd like to process.

Best regards, Yana

pdoris commented 1 year ago

Yes. this was on a Mac. A fix would be wonderful!

pdoris commented 1 year ago

I installed it in Linux and got the same error:

(igdetective) c305-005.ls6(198)$ python run_iterative_igdetective.py /work/06127/pdoris/BN_HiFi_curated.fa /work/06127/pdoris/BN /work/06127/pdoris/miniconda3/envs/igdetective/bin/minimap2 WARN: output directory /work/06127/pdoris/BN exists and will be overwritten! ==== Aligning human IG genes... Aligning IGLC genes (datafiles/human_reference_genes/IGLC.fa)... Aligning IGHC genes (datafiles/human_reference_genes/IGHC.fa)... Aligning IGHJ genes (datafiles/human_reference_genes/IGHJ.fa)... Aligning IGKJ genes (datafiles/human_reference_genes/IGKJ.fa)... Aligning IGLJ genes (datafiles/human_reference_genes/IGLJ.fa)... Aligning IGKC genes (datafiles/human_reference_genes/IGKC.fa)... Aligning IGLV genes (datafiles/human_reference_genes/IGLV.fa)... Aligning IGKV genes (datafiles/human_reference_genes/IGKV.fa)... Aligning IGHV genes (datafiles/human_reference_genes/IGHV.fa)... ==== Identifying IG contigs... ==== Running RSS-based IgDetective for IGH... Contig: CHR_6, contig range: (137515617, 147156653), approx locus length: 9641036 Running: python py/IGDetective.py -i /work/06127/pdoris/BN/denovo_search/combined_contigs_IGH.fasta -o /work/06127/pdoris/BN/denovo_search/predicted_genes_IGH -m 1 -l IGH ==== Running RSS-based IgDetective for IGK... Contig: CHR_4, contig range: (97247562, 104902484), approx locus length: 7654922 Running: python py/IGDetective.py -i /work/06127/pdoris/BN/denovo_search/combined_contigs_IGK.fasta -o /work/06127/pdoris/BN/denovo_search/predicted_genes_IGK -m 1 -l IGK ==== Running RSS-based IgDetective for IGL... ==== Iterative processing IGHV genes... Running minimap... Alignment of IG genes datafiles/combined_reference_genes/IGHV.fa to /work/06127/pdoris/BN_HiFi_curated.fa Processing SAM file... /work/06127/pdoris/miniconda3/envs/igdetective/lib/python3.11/site-packages/Bio/Seq.py:2804: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future. warnings.warn( Traceback (most recent call last): File "/work/06127/pdoris/miniconda3/envs/igdetective/IgDetective-main/run_iterative_igdetective.py", line 300, in main(genome_fasta, output_dir, ig_gene_dir) File "/work/06127/pdoris/miniconda3/envs/igdetective/IgDetective-main/run_iterative_igdetective.py", line 269, in main AlignGenesIteratively(ref_gene_fasta, igdetective_tsv, genome_fasta, iter_dir, gene) File "/work/06127/pdoris/miniconda3/envs/igdetective/IgDetective-main/run_iterative_igdetective.py", line 144, in AlignGenesIteratively gene_finding_tools.main(genome_fasta, ref_gene_fasta, iter0_dir) File "/work/06127/pdoris/miniconda3/envs/igdetective/IgDetective-main/py/extract_aligned_genes.py", line 149, in main aa_seq = str(Seq(alignment.gene_seq).translate()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/work/06127/pdoris/miniconda3/envs/igdetective/lib/python3.11/site-packages/Bio/Seq.py", line 1448, in translate _translate_str(str(self), table, stop_symbol, to_stop, cds, gap=gap) File "/work/06127/pdoris/miniconda3/envs/igdetective/lib/python3.11/site-packages/Bio/Seq.py", line 2836, in _translate_str raise CodonTable.TranslationError( Bio.Data.CodonTable.TranslationError: Codon 'GET' is invalid

yana-safonova commented 1 year ago

Thank you for checking how it works on linux! I am now confident that this issue can be explained by the differences between Biopython versions. It looks like we used an older version than you do. I will add a fix next week.

pdoris commented 1 year ago

Hi Yana....

any progress with the fix?

Peter

StefanLelieveld commented 6 months ago

Hi Yana and Peter,

Using BioPython v1.81 I encountered the same error as you described here, Peter. As Yana mentioned, the issue seems to be related to the version of BioPython and how the data is being processed. Using an older version of BioPython solved it: I created a python venv where I installed BioPython version 1.77. That resolved this issue for me.

In more detail: Debugging using BioPython v1 .81 shows that the Alignment.gene_seq Stings start with the word "TARGET" leading to the error that "GET" is an invalid codon.

Screenshot 2024-03-22 at 14 57 36

Stefan