Open ebhmayra opened 1 year ago
Hi thanks for getting in touch! Could you check if the names of the single copy genes you are using appear in the same way:
samtools view -H <file.bam>
. All of them need to appear as a @SQ
line in the bam headergrep '>' <library.fa>
The spelling etc. needs to be exactly the same between these three occurrences, otherwise it won't be matched up correctly. Since the error is coming from pysam, my guess is that there is something wrong or missing from the bam file. Did you include the same single copy genes when you aligned the reads?
Best wishes, Lukas
Hi Lukas,
This was useful. Thank you for your help. Best, Mayra
Hello,
I have a new issue with DEVIATE, I'm trying to use it with --single_copy_genes normalization. I included 5 genes sequences in the --library file and then I call them for the normalization, but I got errors "ValueError: invalid contig
Gene1
". Could you help me to find out what is wrong, please?This is my script:
deviaTE --input_bam $data_folder/minimifoliaBT131_aligned.bam --families rnd_1_family_1_Unknown,rnd_1_family_334_LTR_Copia,rnd_1_family_54_LTR_Gypsy,rnd_3_family_201_Unknown,rnd_4_family_797_LTR_Gypsy,rnd_4_family_195_LTR_Gypsy,rnd_4_family_256_Unknown,rnd_4_family_342_LTR_Gypsy,rnd_4_family_694_LINE_RTE_BovB,rnd_5_family_1219_LINE_RTE_BovB,rnd_5_family_1298_LTR_Gypsy,rnd_5_family_1744_LTR_Gypsy,rnd_6_family_1113_Unknown,rnd_5_family_989_Unknown,rnd_6_family_3088_LTR_Gypsy,rnd_6_family_8663_LTR_Gypsy --single_copy_genes Gene1,Gene2,Gene3,Gene4,Gene5 --library $data_folder/impolita-families_mcclintok9_cdhit2_cat.fa
This is the error:
**** Analysis Starting analysis of rnd_1_family_1_Unknown in /scratch/botany/mayra/diospyros/minimifoliaBT131_aligned.bam.fused.sort.bam..
No annotaions found for: rnd_1_family_1_Unknown Reference sequence contains ambiguous nucleotide: N Reference sequence contains ambiguous nucleotide: N Reference sequence contains ambiguous nucleotide: N Reference sequence contains ambiguous nucleotide: N Reference sequence contains ambiguous nucleotide: N Reference sequence contains ambiguous nucleotide: N Reference sequence contains ambiguous nucleotide: N Normalization: single copy genes
Traceback (most recent call last): File "/home/user/bricenohuayta/.conda/envs/deviaTE_env/bin/deviaTE_analyse", line 113, in
scg_fac = sample.get_norm_fac_scg(genes=args.single_copy_genes)
File "/home/user/bricenohuayta/.conda/envs/deviaTE_env/lib/python3.6/site-packages/deviaTE/deviaTE_pileup.py", line 162, in get_norm_fac_scg
sum_cov = sum([len(pileupcolumn.pileups) for pileupcolumn in bamfile_op.pileup(contig=g, truncate=True)])
File "pysam/libcalignmentfile.pyx", line 1322, in pysam.libcalignmentfile.AlignmentFile.pileup
File "pysam/libchtslib.pyx", line 687, in pysam.libchtslib.HTSFile.parse_region
ValueError: invalid contig
Gene1
**** Visualization
During startup - Warning message: Setting LC_CTYPE failed, using "C" Warning message: Input file does not exist
Best regards, Mayra