gaolabtools / scNanoGPS

Single cell Nanopore sequencing data for Genotype and Phenotype
Other
39 stars 2 forks source link

reporter_SNV problem #32

Closed cariocow closed 5 months ago

cariocow commented 6 months ago

Hi,

Thanks for making the scNanoGPS. I can successfully run reporter_expression and reporter_isoform. but when run the reporter_SNV, i got below error. please advice how to fix it. may i know which tabix version you are using. i originally use bcftools 1.20, it does not work. then i downgrade to bcftools 1.15. then i got error as below.

""" python3 reporter_SNV.py -t 2 --ref_genome example/GRCh38_chr22.fa

Time stamp: Sun, 12 May 2024 17:11:20

Loading CB list...

Time stamp: Sun, 12 May 2024 17:11:20

Calling SNV by longshot...

Longshot finished !

Time stamp: Sun, 12 May 2024 17:11:20

Time elapse: 0 : 0 : 0.03

Merge Longshot result... Merging part 1 of 1 ... Generating final matrix... tbx_index_build3 failed: tmp/splitted_vcf_list_aa.merge.vcf.gz ls: cannot access 'tmp/splitted_vcflist*.merge.vcf.gz': No such file or directory Segmentation fault Filtering by prevalence: 0.01 ...

Total SNVs no: 0 0 SNVs pass filtering... Done.

Time stamp: Sun, 12 May 2024 17:11:20

Time elapse: 0 : 0 : 0.07

Counting reads no. supporting SNVs...

Time stamp: Sun, 12 May 2024 17:11:20

Time elapse: 0 : 0 : 0.08

Generating SNVs depth table... Traceback (most recent call last): File "/home/cario/bin/scNanoGPS/reporter_SNV.py", line 556, in options, pileup_df = merge_mpileup(CB_list, options) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/cario/bin/scNanoGPS/reporter_SNV.py", line 208, in merge_mpileup pileup_df['POS'] = pileup_df['POS'].apply(str)


  File "/home/cario/bin/miniforge/envs/scNanoGPS/lib/python3.12/site-packages/pandas/core/frame.py", line 4102, in __getitem__
    indexer = self.columns.get_loc(key)
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cario/bin/miniforge/envs/scNanoGPS/lib/python3.12/site-packages/pandas/core/indexes/range.py", line 417, in get_loc
    raise KeyError(key)
KeyError: 'POS'
Error: Duplicate sample names (SAMPLE), use --force-samples to proceed anyway.
oscargamarra commented 6 months ago

The output of the same command for me is this one:

Time stamp: Mon, 13 May 2024 15:37:31

Loading CB list...

Time stamp: Mon, 13 May 2024 15:37:31

Calling SNV by longshot...

Longshot finished !

Time stamp: Mon, 13 May 2024 15:37:31

Time elapse: 0 : 0 : 0.01

Merge Longshot result... ls: cannot access 'tmp/longshot.output.*.vcf.gz': No such file or directory

Generating final matrix... ls: cannot access 'tmp/splitted_vcflist.merge.vcf.gz': No such file or directory Failed to read from tmp/final_vcf_list rm: cannot remove 'tmp/splitted_vcflist': No such file or directory Filtering by prevalence: 0.01 ... Traceback (most recent call last): File "/hpc/pmc_holstege/oscar/scNanoGPS/reporter_SNV.py", line 540, in merge_longshot(CB_list, options) File "/hpc/pmc_holstege/oscar/scNanoGPS/reporter_SNV.py", line 136, in merge_longshot filter_by_prevalence(options) File "/hpc/pmc_holstege/oscar/scNanoGPS/reporter_SNV.py", line 62, in filter_by_prevalence fh = bgzf.open(os.path.join(options.o_dir, options.o_pref) + '.raw.vcf.gz', 'rt') File "/hpc/pmc_holstege/oscar/anaconda3/envs/scNanoGPS/lib/python3.9/site-packages/Bio/bgzf.py", line 273, in open return BgzfReader(filename, mode) File "/hpc/pmc_holstege/oscar/anaconda3/envs/scNanoGPS/lib/python3.9/site-packages/Bio/bgzf.py", line 606, in init handle = _open(filename, "rb") FileNotFoundError: [Errno 2] No such file or directory: 'scNanoGPS_res/matrix_SNV.raw.vcf.gz'

Doing it with another ref genome. I have also ran all the previous steps (except for 5.1 and 5.2) with my own data and when I get to longshot SNV calling I have empty vcf files:

zcat longshot.output.TTTTAGTCCAATCTCT.vcf.gz | tail

FILTER=

FILTER=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

bcftools_viewVersion=1.11+htslib-1.13

bcftools_viewCommand=view -Oz -o tmp/longshot.output.TTTTAGTCCAATCTCT.vcf.gz tmp/longshot.output.TTTTAGTCCAATCTCT.vcf; Date=Tue May 7 11:28:38 2024

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT TTTTAGTCCAATCTCT

I really don't know where the vcf files come from, whether it's a problem of empty/incorrect input or longshot/bcftools issues.

Do you have any suggestions? Thanks!

shiauck commented 6 months ago

Hi Cariocow,

In the log you provided, the error happened in tabix step.

tbx_index_build3 **failed**: tmp/splitted_vcf_list_aa.merge.vcf.gz

Please make sure the tabix is working.

The versions of bcftools, tabix, and htslib I'm using are all v1.15.

Regards, Cheng-Kai

shiauck commented 6 months ago

Hi Oscargamarra,

In the log message you provided:

Calling SNV by longshot...
Longshot finished !
Time stamp: Mon, 13 May 2024 15:37:31
Time elapse: 0 : 0 : 0.01

It only took 0.01 seconds for running longshot, which is way too fast like it never run.

Please check "run_reporter_SNV.log.txt" if you never change the log file name to see whether there's any further error message which is helpful for debugging. Alternatively, please refer to longshot GitHub and try running the example provided by longshot author to make sure your longshot works well.

Regards, Cheng-Kai

cariocow commented 6 months ago

Thanks Cheng-Kai for your reply. I finally resolved this by removing "&> /dev/null'" from the script. i don't know why but it works now. thanks anyway. best Cario

shiauck commented 6 months ago

Hi Cario,

Thank you for letting me know. May I know which operation system you are using now ?

The code '&> /dev/null' is to dump out unnecessary system message. I developed the pipeline on CentOS which uses /dev/null for dumping. Maybe your OS using different way. I'll update the code to silence tabix by setting verbosity to 0. Thanks.

Regards, Cheng-Kai

cariocow commented 6 months ago

Hi Cheng-Kai another issue comes up when doing "Generating SNVs depth table" but that is minor issue. the merge in line#218 does not work becaues the dtype of 'CHROM' I added a line at line#217: dp_df['CHROM'] = dp_df['CHROM'].apply(str) now, it seems the probem resolved. :) Cario

shiauck commented 5 months ago

Hi Cario,

Thank you for your suggestion. I've updated the reporter_SNV.py to accommodate GTF using numeric chromosome ID.

Regards, Cheng-Kai