W-L / deviaTE

Python tool for the analysis and visualization of mobile genetic elements
GNU General Public License v3.0
19 stars 7 forks source link

Single_copy_genes, ValueError: invalid contig Gene1 #13

Open ebhmayra opened 1 year ago

ebhmayra commented 1 year ago

Hello,

I have a new issue with DEVIATE, I'm trying to use it with --single_copy_genes normalization. I included 5 genes sequences in the --library file and then I call them for the normalization, but I got errors "ValueError: invalid contig Gene1". Could you help me to find out what is wrong, please?

This is my script:


deviaTE --input_bam $data_folder/minimifoliaBT131_aligned.bam --families rnd_1_family_1_Unknown,rnd_1_family_334_LTR_Copia,rnd_1_family_54_LTR_Gypsy,rnd_3_family_201_Unknown,rnd_4_family_797_LTR_Gypsy,rnd_4_family_195_LTR_Gypsy,rnd_4_family_256_Unknown,rnd_4_family_342_LTR_Gypsy,rnd_4_family_694_LINE_RTE_BovB,rnd_5_family_1219_LINE_RTE_BovB,rnd_5_family_1298_LTR_Gypsy,rnd_5_family_1744_LTR_Gypsy,rnd_6_family_1113_Unknown,rnd_5_family_989_Unknown,rnd_6_family_3088_LTR_Gypsy,rnd_6_family_8663_LTR_Gypsy --single_copy_genes Gene1,Gene2,Gene3,Gene4,Gene5 --library $data_folder/impolita-families_mcclintok9_cdhit2_cat.fa


This is the error:


**** Analysis Starting analysis of rnd_1_family_1_Unknown in /scratch/botany/mayra/diospyros/minimifoliaBT131_aligned.bam.fused.sort.bam..

No annotaions found for: rnd_1_family_1_Unknown Reference sequence contains ambiguous nucleotide: N Reference sequence contains ambiguous nucleotide: N Reference sequence contains ambiguous nucleotide: N Reference sequence contains ambiguous nucleotide: N Reference sequence contains ambiguous nucleotide: N Reference sequence contains ambiguous nucleotide: N Reference sequence contains ambiguous nucleotide: N Normalization: single copy genes

Traceback (most recent call last): File "/home/user/bricenohuayta/.conda/envs/deviaTE_env/bin/deviaTE_analyse", line 113, in scg_fac = sample.get_norm_fac_scg(genes=args.single_copy_genes) File "/home/user/bricenohuayta/.conda/envs/deviaTE_env/lib/python3.6/site-packages/deviaTE/deviaTE_pileup.py", line 162, in get_norm_fac_scg sum_cov = sum([len(pileupcolumn.pileups) for pileupcolumn in bamfile_op.pileup(contig=g, truncate=True)]) File "pysam/libcalignmentfile.pyx", line 1322, in pysam.libcalignmentfile.AlignmentFile.pileup File "pysam/libchtslib.pyx", line 687, in pysam.libchtslib.HTSFile.parse_region ValueError: invalid contig Gene1

**** Visualization

During startup - Warning message: Setting LC_CTYPE failed, using "C" Warning message: Input file does not exist


Best regards, Mayra

W-L commented 1 year ago

Hi thanks for getting in touch! Could you check if the names of the single copy genes you are using appear in the same way:

The spelling etc. needs to be exactly the same between these three occurrences, otherwise it won't be matched up correctly. Since the error is coming from pysam, my guess is that there is something wrong or missing from the bam file. Did you include the same single copy genes when you aligned the reads?

Best wishes, Lukas

ebhmayra commented 10 months ago

Hi Lukas,

This was useful. Thank you for your help. Best, Mayra