KChen-lab / Monopogen

SNV calling from single cell sequencing
GNU General Public License v3.0
68 stars 16 forks source link

About Somatic SNV calling from scRNA-seq #12

Open scg-dgist opened 10 months ago

scg-dgist commented 10 months ago

Hello, Thank you for developing such a wonderful method. Recently, I attempted to execute somatic variant calling from scRNA-seq data, following the step-by-step instructions provided on your GitHub repository. However, I encountered an issue during the refinement of putative somatic SNVs. The first step in this section, which involves the “-s featureInfo” command, executed successfully. However, I encountered an error while running “cellScan”.

Here is the code I used and the accompanying error message.

My running code

path="~/Monopogen" output_path=${path}/test/outs reference_fasta_path=${path}/example/chr20_2Mb.hg38.fa

$ python ${path}/src/Monopogen.py somatic -a ${path}/apps \ -r ${path}/test/region.lst -t ${num_thread} -w 10MB -i ${output_path} \ -l ${path}/example/CB_7K.maester_scRNA.csv -s cellScan -g ${reference_fasta_path}



Error Message

[mpileup] fail to load index for ~/Monopogen/test/outs/Bam/split_bam/AGAGAGCTCGCCACTT-1.bam [mpileup] fail to load index for ~/Monopogen/test/outs/Bam/split_bam/AGAGAGCTCGCCACTT-1.bam [mpileup] fail to load index for ~/Monopogen/test/outs/Bam/split_bam/AGAGAGCTCGCCACTT-1.bam [mpileup] fail to load index for ~/Monopogen/test/outs/Bam/split_bam/AGAGAGCTCGCCACTT-1.bam [mpileup] fail to load index for ~/Monopogen/test/outs/Bam/split_bam/AGAGAGCTCGCCACTT-1.bam [mpileup] fail to load index for ~/Monopogen/test/outs/Bam/split_bam/AGAGAGCTCGCCACTT-1.bam [mpileup] fail to load index for ~/Monopogen/test/outs/Bam/split_bam/AGAGAGCTCGCCACTT-1.bam Failed to open -: unknown file type Failed to open -: unknown file type Failed to open -: unknown file type Failed to open -: unknown file type Failed to open -: unknown file type Failed to open -: unknown file type Failed to open -: unknown file type Failed to open -: unknown file type Failed to open -: unknown file type Failed to open -: unknown file type Failed to open -: unknown file type Failed to open -: unknown file type Failed to open -: unknown file type Failed to open -: unknown file type

[E::bcf_hdr_read] Input is not detected as bcf or vcf format Failed to parse header: ~/Monopogen/test/outs/somatic/chr20:2-10000001.cell.gl.vcf.gz [E::hts_open_format] Failed to open file "~/Monopogen/test/outs/somatic/chr20.cell.gl.vcf.gz" : No such file or directory multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "~/.conda/envs/Monopogen_py37/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, *kwds)) File "~/.conda/envs/Monopogen_py37/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar return list(map(args)) File "~/Monopogen/src/somatic.py", line 248, in vcf2mat vcf_in = pysam.VariantFile(out + "/somatic/" + region + ".cell.gl.vcf.gz") File "pysam/libcbcf.pyx", line 4054, in pysam.libcbcf.VariantFile.init File "pysam/libcbcf.pyx", line 4279, in pysam.libcbcf.VariantFile.open FileNotFoundError: [Errno 2] could not open variant file b'~/Monopogen/test/outs/somatic/chr20.cell.gl.vcf.gz': No such file or directory """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "~/Monopogen/src/Monopogen.py", line 441, in main() File "~/Monopogen/src/Monopogen.py", line 434, in main args.func(args) File "~/Monopogen/src/Monopogen.py", line 256, in somatic result = pool.map(vcf2mat, joblst) File "~/.conda/envs/Monopogen_py37/lib/python3.7/multiprocessing/pool.py", line 268, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "~/.conda/envs/Monopogen_py37/lib/python3.7/multiprocessing/pool.py", line 657, in get raise self._value FileNotFoundError: [Errno 2] could not open variant file b'~/Monopogen/test/outs/somatic/chr20.cell.gl.vcf.gz': No such file or directory


I'm currently using the latest version of Monopogen, which I installed directly from your GitHub repository. My Python version is 3.7.12, and I have already checked that all the prerequisites, including pysam, pandas, and numpy, are installed as per your guidelines. However, I am still encountering this error. Could you please provide some guidance on how to resolve this issue? Thank you in advance for your assistance.

jinzhuangdou commented 10 months ago

Hi, this is because many files opened simultaneously which is out the limit of your system. See FAQs.

[mpileup] fail to load index for xx/Bam/split_bam/chr20_xx.bam; Failed to open -: unknown file type In this step, Monopogen needs to open bam files from multiple cells. This happens because the server has limit open file limit. You can check by typing

ulimit -n

If the number is smaller than the cells in your study, please change the maximum of the open files. If the file number opened is still large, you can set smaller value on the option -t in the cellScan step (such as 5). In such case, only 5 regions were processed simultaneously.

We are working on an updated version to avoid such issue in the next release.

scg-dgist commented 10 months ago

I truly appreciate your assistance. Following your guidance, I increased the open file limit on the server, and I successfully obtained results from cellScan. However, I encountered an error during the execution of the final step ("-s LDrefinement"), specifically during the SVM step. I have attached the code I executed along with the error messages. I sincerely thank you for your previous guidance, and I would greatly appreciate it if you could advise me on how to overcome this issue.

My running code

path="~/Monopogen" output_path=${path}/test/outs num_thread=$(nproc) cell_barcode_file="CB_7K.maester_scRNA.csv" reference_fasta_path=${path}/example/chr20_2Mb.hg38.fa

python ${path}/src/Monopogen.py somatic -a ${path}/apps \ -r ${path}/test/region.lst -t ${num_thread} -i ${output_path} \ -l ${path}/example/${cell_barcode_file} -s LDrefinement -g ${reference_fasta_path}


error message

INFO Monopogen.py Run LD refinement ...

Error in test_x[, colnames(test_x) == "QS"] : incorrect number of dimensions Calls: SVM_train Execution halted

ERROR Monopogen.py In LDrefinement step chr20 failed! ERROR Monopogen.py Failed! See instructions above.


Many thanks.