Closed ariannadib closed 2 months ago
Could you show some log file during the cellScan? There are some numbers output that may help me to know what happens in your data.
[2024-05-10 15:01:30,392] INFO Monopogen.py Collect single cell level information from sequencing data... scanning read 1000000 scanning read 2000000 2086957:2086913:1779007:649618:351977:498875
Yes, of course
(base) Lucas-MBP:Monopogen Arianna$ python3 ${path}/src/Monopogen.py somatic -a ${path}/apps -r /Users/Arianna/Monopogen/test/region.lst -t 1 -i somatic -l /Users/Arianna/Monopogen/example/CB_7K.maester_scRNA.csv -s cellScan -g /Users/Arianna/Monopogen/example/Homo_sapiens.GRCh38.dna.primary_assembly.fa
[2024-05-13 10:58:05,510] INFO Monopogen.py Collect single cell level information from sequencing data... 0:0:0:0:0:0 multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/Users/Arianna/anaconda3/lib/python3.10/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, kwds)) File "/Users/Arianna/anaconda3/lib/python3.10/multiprocessing/pool.py", line 48, in mapstar return list(map(args)) File "/Users/Arianna/Monopogen/src/somatic.py", line 291, in bam2mat mat = pd.read_csv(mat_infile, sep="\t", header=None) File "/Users/Arianna/anaconda3/lib/python3.10/site-packages/pandas/util/_decorators.py", line 211, in wrapper return func(args, kwargs) File "/Users/Arianna/anaconda3/lib/python3.10/site-packages/pandas/util/_decorators.py", line 331, in wrapper return func(*args, kwargs) File "/Users/Arianna/anaconda3/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 950, in read_csv return _read(filepath_or_buffer, kwds) File "/Users/Arianna/anaconda3/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 605, in _read parser = TextFileReader(filepath_or_buffer, kwds) File "/Users/Arianna/anaconda3/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1442, in init self._engine = self._make_engine(f, self.engine) File "/Users/Arianna/anaconda3/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1753, in _make_engine return mapping[engine](f, self.options) File "/Users/Arianna/anaconda3/lib/python3.10/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 79, in init self._reader = parsers.TextReader(src, kwds) File "pandas/_libs/parsers.pyx", line 554, in pandas._libs.parsers.TextReader.cinit pandas.errors.EmptyDataError: No columns to parse from file """
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/Arianna/Monopogen/src/Monopogen.py", line 338, in
thank you!!
This step is not finished successfully. Could you show some header line of /Users/Arianna/Monopogen/example/Homo_sapiens.GRCh38.dna.primary_assembly.fa
?
In addition, is there a merged targeted bam file in you 'bam' folder? If it has, is it empty or not?
(base) Lucas-MBP:Monopogen Arianna$ head -n 20 /Users/Arianna/Monopogen/example/Homo_sapiens.GRCh38.dna.primary_assembly.fa
chr1 dna:chromosome chromosome:GRCh38:1:1:248956422:1 REF NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN... chr2 dna:chromosome chromosome:GRCh38:2:1:242193529:1 REF NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN...
No with the new version I don't have merged.filter.targeted.bam files when I used the old version it created them, now after the featureInfo step it only creates these four files: chr1.gl.vcf.DP4: chr1 14666 G T 1 0.0 0.0 0.0 1.0 NA None 0.0 None None None None -0.38 0.0 chr1 14677 G A 1 0.0 0.0 0.0 1.0 NA None 0.0 None None None None -0.38 0.0 chr1 16452 A G 6 0.0 5.0 0.0 1.0 NA None 0.83 1.0 1.0 1.0 None -0.38 0.0... chr1.gl.vcf.filter.DP4: chr1 127491 T G 1 0.0 0.0 0.0 1.0 1|0 None 0.0 None None None None -0.38 0.0 chr1 135012 C A 294 0.0 287.0 0.0 4.0 NA 0.01 0.99 0.04 1.0 0.31 None -0.56 0.0 chr1 135027 C A 202 0.0 192.0 0.0 5.0 NA 0.0 0.97 0.05 1.0 0.42 None -0.59 0.0... chr1.gl.vcf.filter.hc.bed: chr1 127490 127491 chr1 135011 135012 chr1 135026 135027...
chr1.gl.vcf.filter.hc.pos: chr1 127491 chr1 135012 chr1 135027...
thank you very much!!
Could you check the cell_barcode file? The column name should be cell
and id
. I will make such more robust to the input.
df = pd.DataFrame(cell_clst, columns= ['cell','id'])
(base) Lucas-MacBook-Pro:~ Arianna$ head -n 20 /Users/Arianna/Monopogen/example/CB_7K.maester_scRNA.csv cell,id AAACCCAGTGACGCCT,13945 AAACCCATCAAAGACA,18348 AAACCCATCACTGAAC,10800 AAACCCATCGCGCCAA,4959 AAACGAATCTTGTTAC,4632 AAACGCTCACATAACC,13276 AAACGCTGTTGGGCCT,9314 AAACGCTTCGATGCTA,6580 AAACGCTTCGTCACCT,15118 AAAGAACGTGTCTTGA,13113 AAAGAACTCTGCTAGA,7379 AAAGGATAGGTGTGAC,7241 AAAGGATCATTGCTGA,14261 AAAGGATGTTGTTTGG,12945 AAAGGGCAGAAGAACG,10800 AAAGGGCAGATTTGCC,6844 AAAGGGCCACTTGTCC,7873 AAAGGGCCATTGAAAG,6230 AAAGGTACAGACTCTA,8642....
Should I change something? I named the file as in the guide
Thank you very much for your help
Okay, could you further check whether the cell barcode is named AAACCCAGTGACGCCT-1
in your bam files? If so, you need to add -1
in the cell barcode file.
It works, thank you very much for your help!
Hi, I'm currently using the updated version of monopogen. the cell scan module returns three files, namely: chr1.cell_snv.cellID.csv: cell,id,index AGGATAATCTATTGTC,68986,0 ACTACGAGTTTGTGGT,42356,1 ACTTCGCTCCGATTAG,36682,2 TAAGCCAGTTAGGGTG,35383,3....
chr1.cell_snv.snvID.csv: snvID,index chr1:127491:T:G,0 chr1:135012:C:A,1 chr1:135027:C:A,2 chr1:135040:T:C,3.........
chr1.cell_snv.mat.gz. the first two seem correctly compiled while the third is empty, this means that it is not possible to create .gl.filter.hc.cell.mat.gz. I don't understand why.
Any help is welcome. Thanks