KChen-lab / Monopogen

SNV calling from single cell sequencing
GNU General Public License v3.0
68 stars 16 forks source link

Error in ld refinement on putative somatic SNVs #52

Closed zhangdong360 closed 2 months ago

zhangdong360 commented 2 months ago

When I ran to step 2 with the demo data, I had the following problem. I have checked that the result of step 1 is correct. The code and error report of step 2 are as follows. Do you know the solution? Code:

path="/share/home/zhangd/tools/scRNA/Monopogen/"
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${path}/apps
python  ${path}/src/Monopogen.py  somatic  \
    -a   ${path}/apps  -r  region.lst  -t 22 \
    -i out  -l CB_7K.maester_scRNA.csv  -s featureInfo    \
    -g  ../example/chr20_2Mb.hg38.fa
# step 2: cellScan
python  ${path}/src/Monopogen.py  somatic  \
    -a   ${path}/apps  -r  region.lst  -t 22 \
    -i   out  -l CB_7K.maester_scRNA.csv  -s cellScan    \
    -g  ../example/chr20_2Mb.hg38.fa

Output:

[2024-05-04 00:02:24,777] INFO     Monopogen.py Collect single cell level information from sequencing data...
0:0:0:0:0:0
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/share/app/conda/conda3/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/share/app/conda/conda3/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/share/home/zhangd/tools/scRNA/Monopogen/src/somatic.py", line 291, in bam2mat
    mat = pd.read_csv(mat_infile, sep="\t", header=None)
  File "/share/home/zhangd/.local/lib/python3.8/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/share/home/zhangd/.local/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 586, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/share/home/zhangd/.local/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 482, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/share/home/zhangd/.local/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 811, in __init__
    self._engine = self._make_engine(self.engine)
  File "/share/home/zhangd/.local/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1040, in _make_engine
    return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
  File "/share/home/zhangd/.local/lib/python3.8/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 69, in __init__
    self._reader = parsers.TextReader(self.handles.handle, **kwds)
  File "pandas/_libs/parsers.pyx", line 549, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/share/home/zhangd/tools/scRNA/Monopogen//src/Monopogen.py", line 338, in <module>
    main()
  File "/share/home/zhangd/tools/scRNA/Monopogen//src/Monopogen.py", line 331, in main
    args.func(args)
  File "/share/home/zhangd/tools/scRNA/Monopogen//src/Monopogen.py", line 170, in somatic
    result = pool.map(bam2mat, joblst)
  File "/share/app/conda/conda3/lib/python3.8/multiprocessing/pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/share/app/conda/conda3/lib/python3.8/multiprocessing/pool.py", line 771, in get
    raise self._value
pandas.errors.EmptyDataError: No columns to parse from file
jinzhuangdou commented 2 months ago

The example dataset is too small to call somatic SNVs (need some germline SNVs to tag somatic mutations). Please try the MAESTER data

`Somatic SNV calling from scRNA-seq

We demonstrate how the LD refinement model implemented in Monopogen can improve somatic SNV detection from scRNA-seq profiles without matched bulk WGS data available. We used the benchmarking dataset of bone marrow single cell samples from Miller et al.,. The raw fastq files could be downloaded from SRA database with SRR15598778, SRR15598779, SRR15598780, SRR15598781, and SRR15598782. For convenience, we shared with the the downloaded bam file from chromosome 20 chr20.master_scRNA.bam.`

zhangdong360 commented 2 months ago

Thank you for your reply, I have already resolved the issue. Additionally, I noticed that the barcodeID in the 'cell' column of the CB_7K.maester_scRNA.csv file has been truncated by '-1' characters, which could lead to errors in my dataset. Should I process it according to the 'cell' column in the CB_7K.maester_scRNA.csv file?

jinzhuangdou commented 2 months ago

It is not necessary. Just make sure the formatting of cell barcode consistent with the CB:Z tag in the bam files. Thanks for your remindation I have added such notice in the readme.