cell_snv.mat.gz missing

ariannadib commented 2 months ago

Hi, I'm currently using the updated version of monopogen. the cell scan module returns three files, namely: chr1.cell_snv.cellID.csv: cell,id,index AGGATAATCTATTGTC,68986,0 ACTACGAGTTTGTGGT,42356,1 ACTTCGCTCCGATTAG,36682,2 TAAGCCAGTTAGGGTG,35383,3....

chr1.cell_snv.snvID.csv: snvID,index chr1:127491:T:G,0 chr1:135012:C:A,1 chr1:135027:C:A,2 chr1:135040:T:C,3.........

chr1.cell_snv.mat.gz. the first two seem correctly compiled while the third is empty, this means that it is not possible to create .gl.filter.hc.cell.mat.gz. I don't understand why.

Any help is welcome. Thanks

jinzhuangdou commented 2 months ago

Could you show some log file during the cellScan? There are some numbers output that may help me to know what happens in your data.

[2024-05-10 15:01:30,392] INFO Monopogen.py Collect single cell level information from sequencing data... scanning read 1000000 scanning read 2000000 2086957:2086913:1779007:649618:351977:498875

ariannadib commented 2 months ago

Yes, of course

(base) Lucas-MBP:Monopogen Arianna$ python3 ${path}/src/Monopogen.py somatic -a ${path}/apps -r /Users/Arianna/Monopogen/test/region.lst -t 1 -i somatic -l /Users/Arianna/Monopogen/example/CB_7K.maester_scRNA.csv -s cellScan -g /Users/Arianna/Monopogen/example/Homo_sapiens.GRCh38.dna.primary_assembly.fa

[2024-05-13 10:58:05,510] INFO Monopogen.py Collect single cell level information from sequencing data... 0:0:0:0:0:0 multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/Users/Arianna/anaconda3/lib/python3.10/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, kwds)) File "/Users/Arianna/anaconda3/lib/python3.10/multiprocessing/pool.py", line 48, in mapstar return list(map(args)) File "/Users/Arianna/Monopogen/src/somatic.py", line 291, in bam2mat mat = pd.read_csv(mat_infile, sep="\t", header=None) File "/Users/Arianna/anaconda3/lib/python3.10/site-packages/pandas/util/_decorators.py", line 211, in wrapper return func(args, kwargs) File "/Users/Arianna/anaconda3/lib/python3.10/site-packages/pandas/util/_decorators.py", line 331, in wrapper return func(*args, kwargs) File "/Users/Arianna/anaconda3/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 950, in read_csv return _read(filepath_or_buffer, kwds) File "/Users/Arianna/anaconda3/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 605, in _read parser = TextFileReader(filepath_or_buffer, kwds) File "/Users/Arianna/anaconda3/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1442, in init self._engine = self._make_engine(f, self.engine) File "/Users/Arianna/anaconda3/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1753, in _make_engine return mapping[engine](f, self.options) File "/Users/Arianna/anaconda3/lib/python3.10/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 79, in init self._reader = parsers.TextReader(src, kwds) File "pandas/_libs/parsers.pyx", line 554, in pandas._libs.parsers.TextReader.cinit pandas.errors.EmptyDataError: No columns to parse from file """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/Users/Arianna/Monopogen/src/Monopogen.py", line 338, in main() File "/Users/Arianna/Monopogen/src/Monopogen.py", line 331, in main args.func(args) File "/Users/Arianna/Monopogen/src/Monopogen.py", line 170, in somatic result = pool.map(bam2mat, joblst) File "/Users/Arianna/anaconda3/lib/python3.10/multiprocessing/pool.py", line 367, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/Users/Arianna/anaconda3/lib/python3.10/multiprocessing/pool.py", line 774, in get raise self._value pandas.errors.EmptyDataError: No columns to parse from file

thank you!!

jinzhuangdou commented 2 months ago

This step is not finished successfully. Could you show some header line of /Users/Arianna/Monopogen/example/Homo_sapiens.GRCh38.dna.primary_assembly.fa?

In addition, is there a merged targeted bam file in you 'bam' folder? If it has, is it empty or not?

ariannadib commented 2 months ago

(base) Lucas-MBP:Monopogen Arianna$ head -n 20 /Users/Arianna/Monopogen/example/Homo_sapiens.GRCh38.dna.primary_assembly.fa

chr1 dna:chromosome chromosome:GRCh38:1:1:248956422:1 REF NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN... chr2 dna:chromosome chromosome:GRCh38:2:1:242193529:1 REF NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN...

No with the new version I don't have merged.filter.targeted.bam files when I used the old version it created them, now after the featureInfo step it only creates these four files: chr1.gl.vcf.DP4: chr1 14666 G T 1 0.0 0.0 0.0 1.0 NA None 0.0 None None None None -0.38 0.0 chr1 14677 G A 1 0.0 0.0 0.0 1.0 NA None 0.0 None None None None -0.38 0.0 chr1 16452 A G 6 0.0 5.0 0.0 1.0 NA None 0.83 1.0 1.0 1.0 None -0.38 0.0... chr1.gl.vcf.filter.DP4: chr1 127491 T G 1 0.0 0.0 0.0 1.0 1|0 None 0.0 None None None None -0.38 0.0 chr1 135012 C A 294 0.0 287.0 0.0 4.0 NA 0.01 0.99 0.04 1.0 0.31 None -0.56 0.0 chr1 135027 C A 202 0.0 192.0 0.0 5.0 NA 0.0 0.97 0.05 1.0 0.42 None -0.59 0.0... chr1.gl.vcf.filter.hc.bed: chr1 127490 127491 chr1 135011 135012 chr1 135026 135027...

chr1.gl.vcf.filter.hc.pos: chr1 127491 chr1 135012 chr1 135027...

thank you very much!!

jinzhuangdou commented 2 months ago

Could you check the cell_barcode file? The column name should be cell and id. I will make such more robust to the input.

df = pd.DataFrame(cell_clst, columns= ['cell','id'])

ariannadib commented 2 months ago

(base) Lucas-MacBook-Pro:~ Arianna$ head -n 20 /Users/Arianna/Monopogen/example/CB_7K.maester_scRNA.csv cell,id AAACCCAGTGACGCCT,13945 AAACCCATCAAAGACA,18348 AAACCCATCACTGAAC,10800 AAACCCATCGCGCCAA,4959 AAACGAATCTTGTTAC,4632 AAACGCTCACATAACC,13276 AAACGCTGTTGGGCCT,9314 AAACGCTTCGATGCTA,6580 AAACGCTTCGTCACCT,15118 AAAGAACGTGTCTTGA,13113 AAAGAACTCTGCTAGA,7379 AAAGGATAGGTGTGAC,7241 AAAGGATCATTGCTGA,14261 AAAGGATGTTGTTTGG,12945 AAAGGGCAGAAGAACG,10800 AAAGGGCAGATTTGCC,6844 AAAGGGCCACTTGTCC,7873 AAAGGGCCATTGAAAG,6230 AAAGGTACAGACTCTA,8642....

Should I change something? I named the file as in the guide

Thank you very much for your help

jinzhuangdou commented 2 months ago

Okay, could you further check whether the cell barcode is named AAACCCAGTGACGCCT-1 in your bam files? If so, you need to add -1 in the cell barcode file.

ariannadib commented 2 months ago

It works, thank you very much for your help!

KChen-lab / Monopogen

cell_snv.mat.gz missing #59