Open AzizHN opened 1 year ago
Hi,
Can you check if there any any intermediate files in /home/aziz/mapping/SRR23337893/
under intermediate_snp_files
or intermediate_phase_files
subfolders, or if there is any variant_calls.snps.vcf.gz
file created? It seems very suspicious that SNP calling took only 0.4s so I am wondering if that step did not run correctly.
Hello @umahsn, thank you for your reply,
Yes I have so many intermediate subfolders :
intermediate_indel_files
containing 2 files (variant_calls.6.indel.vcf
and variant_calls.raw.indel.vcf
)
intermediate_phase_files
containing 4 files (2X refsequenceID.snps.phased.vcf.gz
and 2X refsequenceID.snps.phased.vcf.gz.tbi
) ( I have 2 ref seqs in my fasta ref file)
intermediate_snp_files
containing 2 files (combined.snps.vcf
and variant_calls.3.snps.vcf
).
And yes, there are a variant_calls.snps.vcf.gz
created (514 octets): a 7-lines header and 9-lines variants table.
My input files are a BAM file (555,1 Ko) and my ref is a fasta file (3,6 Ko)
Hi, I think there might be a problem with passing the filenames internally within NanoCaller for haploid genomes. Let me check this and get back to you.
Can you tell me if /home/aziz/mapping/SRR23337893/variant_calls.snps.phased.vcf.gz or refsequenceID.snps.phased.vcf.gz files are empty and if they have a header?
Hello @umahsn thanks you for your response. The phased files are always empty !!
Hi,
I checked the issue and it turns out that presence of colon symbol ":" in the names of reference sequences is causing the problem. NanoCaller uses a linux system commands to run whatsapp for phasing and bcftools for VCF file manipulation. As a result, if a file VCF file that is named after a reference sequence that has colon in the name, then linux is not able to resolve the path to the file correctly. Once I replace colon with some other symbol in the reference and BAM files, it runs correctly.
Hello I ran this command in order to detect variants in my mapped ONT reads (mapped with minimap2)
NanoCaller --mode all --sequencing ont --haploid_genome --bam sorted_mapped_reads.bam --ref genes.fna
I got this as a result:
2023-06-23 12:27:16.562651: Starting NanoCaller.
NanoCaller command and arguments are saved in the following file: /home/aziz/mapping/SRR23337893/args
2023-06-23 12:27:16.947255: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. SNP Calling Progress: 100%|███████████████████████| 2/2 [00:00<00:00, 6.89it/s]
2023-06-23 12:27:18.763662: Combining SNP calls.
2023-06-23 12:27:18.764897: Compressing and indexing SNP calls. Writing to /tmp/bcftools.dkVQT8 Merging 1 temporary files Cleaning Done
2023-06-23 12:27:18.824115: SNP calling completed. Time taken= 0.4034
Indel Calling Progress: 100%|█████████████████████| 2/2 [00:00<00:00, 3.99it/s]
2023-06-23 12:27:19.487620: Compressing and indexing indel calls. Checking the headers and starting positions of 2 files [E::bcf_hdr_read] Input is not detected as bcf or vcf format Failed to parse header: /home/aziz/mapping/SRR23337893/variant_calls.snps.phased.vcf.gz
2023-06-23 12:27:20.501190: Indel calling completed. Time taken= 1.6770
2023-06-23 12:27:20.501373: Total Time Elapsed: 3.94 seconds
It seems that everything is going well, but there was a problem in the header in the file variant_calls.snps.phased.vcf.gz 2023-06-23 12:27:19.487620: Compressing and indexing indel calls. Checking the headers and starting positions of 2 files [E::bcf_hdr_read] Input is not detected as bcf or vcf format Failed to parse header: /home/aziz/mapping/SRR23337893/variant_calls.snps.phased.vcf.gz
Does this error can influence my results, does anyone have an idea about it ? Thanks in advance