gaolabtools / scNanoGPS

Single cell Nanopore sequencing data for Genotype and Phenotype
Other
39 stars 2 forks source link

matrix_SNV.raw.vcf.gz unzipped in Step 5 #31

Closed oscargamarra closed 6 months ago

oscargamarra commented 6 months ago

Hi, first of all, thank you for developing scNanoGPS. I am trying to do SNV calling from the raw fastq files of our sequencing. I was running into an issue in the 5th step (5.3. single cell SNV profile). This is the traceback of the error:

Merge Longshot result... Merging part 10 of 10 ... Generating final matrix... Filtering by prevalence: 0.01 ... Traceback (most recent call last): File "/hpc/pmc_holstege/oscar/scNanoGPS/snvcalling/../reporter_SNV.py", line 540, in merge_longshot(CB_list, options) File "/hpc/pmc_holstege/oscar/scNanoGPS/snvcalling/../reporter_SNV.py", line 136, in merge_longshot filter_by_prevalence(options) File "/hpc/pmc_holstege/oscar/scNanoGPS/snvcalling/../reporter_SNV.py", line 62, in filter_by_prevalence fh = bgzf.open(os.path.join(options.o_dir, options.o_pref) + '.raw.vcf.gz', 'rt') File "/hpc/pmc_holstege/oscar/anaconda3/envs/scNanoGPS/lib/python3.9/site-packages/Bio/bgzf.py", line 273, in open return BgzfReader(filename, mode) File "/hpc/pmc_holstege/oscar/anaconda3/envs/scNanoGPS/lib/python3.9/site-packages/Bio/bgzf.py", line 617, in init self._load_block(handle.tell()) File "/hpc/pmc_holstege/oscar/anaconda3/envs/scNanoGPS/lib/python3.9/site-packages/Bio/bgzf.py", line 644, in _load_block block_size, self._buffer = _load_bgzf_block(handle, self._text) File "/hpc/pmc_holstege/oscar/anaconda3/envs/scNanoGPS/lib/python3.9/site-packages/Bio/bgzf.py", line 444, in _load_bgzf_block raise ValueError( ValueError: A BGZF (e.g. a BAM file) block should start with b'\x1f\x8b\x08\x04', not b'##fi'; handle.tell() now says 4

Looking into the scNanoGPS_res dir, I see that both matrix_SNV.raw.vcf.gz and matrix_SNV.filtered.vcf.gz have been created, but when I do 'file matrix_SNV.raw.vcf.gz' in the terminal this is the output: 'matrix_SNV.raw.vcf.gz: Variant Call Format (VCF) version 4.2, ASCII text, with very long lines'; and matrix_SNV.filtered.vcf.gz is empty. My intuition is that this file is not being zipped and is causing the problem, but I am not really sure how it works.

Do you have any suggestions on what might be causing the problem? Thank you a lot!

shiauck commented 6 months ago

Hi,

I agree with your intuition. It looks like the vcf is failed to be compressed correctly in a certain step.

Please try the following command step-by-step: (1) Make sure each one longshot result file is compressed

cat `ls tmp/longshot.output.*.vcf.gz | head -n 1` | head -n 1

Expected result should be in binary which could not be understand. But when using zcat then it should be readable.

(2) Check vcf merging output is compressed

ls tmp/longshot.output.*.vcf.gz | split -l 500 - splitted_vcf_list_
bcftools merge -0 -l splitted_vcf_list_aa -Oz -o splitted_vcf_list_aa.merge.vcf.gz
tabix -p vcf splitted_vcf_list_aa.merge.vcf.gz

From here, please check the file "splitted_vcf_list_aa.merge.vcf.gz" is compressed by using "zcat".

ls splitted_vcf_list_aa.merge.vcf.gz > final_vcf_list
bcftools merge -0 -l final_vcf_list -Oz -o test.vcf.gz

From here, please check the file "test.vcf.gz" is compressed.

Please keep me updated of the trial results. Thanks.

Besides, I've updated reporter_SNV.py line 126 to further guarantee the merged vcf file is compressed. Please download reporter_SNV.py again and run it again to see whether this fix the problem or not.

Regards, Cheng-Kai

oscargamarra commented 6 months ago

Hi Cheng-Kai, thanks for the fast reply. The "test.vcf.gz" does work, there was a problem with my installation of bcftools (a dependency was not correctly installed). After doing ldd $(whereis bcftools) I had this problem: libcrypto.so.1.0.0 => not found I added my base libcrypto.so.1.0.0 to my path and it works now. Now I am running into another problem similar to #32. I tested with the --ref_genome example/GRCh38_chr22.fa argument and with my own data, but I run into the same problem. Maybe we can close this issue then?

shiauck commented 6 months ago

I'm closing this issue. I'll answer #32.