KChen-lab / Monopogen

SNV calling from single cell sequencing
GNU General Public License v3.0
80 stars 17 forks source link

preprocess error #24

Open zheng-sc opened 11 months ago

zheng-sc commented 11 months ago

hey I got the error while running preprocess always. could you help me out?

[2023-10-17 19:20:15,607] INFO     Monopogen.py Performing data preprocess before variant calling...
[2023-10-17 19:20:15,607] INFO     germline.py Parameters in effect:
[2023-10-17 19:20:15,607] INFO     germline.py --subcommand = [preProcess]
[2023-10-17 19:20:15,607] INFO     germline.py --bamFile = [bam.lst]
[2023-10-17 19:20:15,607] INFO     germline.py --out = [s1_out]
[2023-10-17 19:20:15,607] INFO     germline.py --app_path = [/home/big/zheng/Monopogen/apps]
[2023-10-17 19:20:15,607] INFO     germline.py --max_mismatch = [3]
[2023-10-17 19:20:15,607] INFO     germline.py --nthreads = [8]
[2023-10-17 19:20:15,614] DEBUG    Monopogen.py PreProcessing sample all_cells
[2023-10-17 19:20:15,809] INFO     germline.py The contig chr5 does not contain the prefix 'chr' and we will add 'chr' on it 
[2023-10-17 19:20:15,814] INFO     germline.py The contig chr1 does not contain the prefix 'chr' and we will add 'chr' on it 
[2023-10-17 19:20:15,818] INFO     germline.py The contig chr2 does not contain the prefix 'chr' and we will add 'chr' on it 
[2023-10-17 19:20:15,820] INFO     germline.py The contig chr4 does not contain the prefix 'chr' and we will add 'chr' on it 
[2023-10-17 19:20:15,821] INFO     germline.py The contig chr6 does not contain the prefix 'chr' and we will add 'chr' on it 
[2023-10-17 19:20:15,821] INFO     germline.py The contig chr3 does not contain the prefix 'chr' and we will add 'chr' on it 
[2023-10-17 19:20:15,823] INFO     germline.py The contig chr8 does not contain the prefix 'chr' and we will add 'chr' on it 
[2023-10-17 19:20:15,823] INFO     germline.py The contig chr7 does not contain the prefix 'chr' and we will add 'chr' on it 
[2023-10-17 19:20:15,921] INFO     germline.py The contig chr9 does not contain the prefix 'chr' and we will add 'chr' on it 
[2023-10-17 19:20:15,929] INFO     germline.py The contig chr10 does not contain the prefix 'chr' and we will add 'chr' on it 
[2023-10-17 19:20:15,949] INFO     germline.py The contig chr11 does not contain the prefix 'chr' and we will add 'chr' on it 
[2023-10-17 19:20:15,954] INFO     germline.py The contig chr12 does not contain the prefix 'chr' and we will add 'chr' on it 
[2023-10-17 19:20:15,954] INFO     germline.py The contig chr13 does not contain the prefix 'chr' and we will add 'chr' on it 
[2023-10-17 19:20:15,956] INFO     germline.py The contig chr15 does not contain the prefix 'chr' and we will add 'chr' on it 
[2023-10-17 19:20:15,958] INFO     germline.py The contig chr14 does not contain the prefix 'chr' and we will add 'chr' on it 
[2023-10-17 19:20:15,959] INFO     germline.py The contig chr16 does not contain the prefix 'chr' and we will add 'chr' on it 
[2023-10-17 19:20:16,026] INFO     germline.py The contig chr17 does not contain the prefix 'chr' and we will add 'chr' on it 
[2023-10-17 19:20:16,033] INFO     germline.py The contig chr18 does not contain the prefix 'chr' and we will add 'chr' on it 
[2023-10-17 19:20:16,055] INFO     germline.py The contig chr19 does not contain the prefix 'chr' and we will add 'chr' on it 
[2023-10-17 19:20:16,061] INFO     germline.py The contig chr20 does not contain the prefix 'chr' and we will add 'chr' on it 
[2023-10-17 19:20:16,063] INFO     germline.py The contig chr21 does not contain the prefix 'chr' and we will add 'chr' on it 
[2023-10-17 19:20:16,065] INFO     germline.py The contig chr22 does not contain the prefix 'chr' and we will add 'chr' on it 
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/zheng/anaconda3/envs/monopogen/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/zheng/anaconda3/envs/monopogen/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/home/big/zheng/Monopogen/src/germline.py", line 200, in BamFilter
    for s in infile.fetch(search_chr):
  File "pysam/libcalignmentfile.pyx", line 1089, in pysam.libcalignmentfile.AlignmentFile.fetch
  File "pysam/libchtslib.pyx", line 683, in pysam.libchtslib.HTSFile.parse_region
ValueError: invalid contig `5`
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/big/zheng/Monopogen/src/Monopogen.py", line 435, in <module>
    main()
  File "/home/big/zheng/Monopogen/src/Monopogen.py", line 428, in main
    args.func(args)
  File "/home/big/zheng/Monopogen/src/Monopogen.py", line 313, in preProcess
    result = pool.map(BamFilter, para_lst)
  File "/home/zheng/anaconda3/envs/monopogen/lib/python3.8/multiprocessing/pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/home/zheng/anaconda3/envs/monopogen/lib/python3.8/multiprocessing/pool.py", line 771, in get
    raise self._value
ValueError: invalid contig `5`

the output of samtools view -h sublibrary1_chr_sorted.bam | head -n 25 is

@HD     VN:1.4  SO:coordinate
@SQ     SN:chr1 LN:248956422
@SQ     SN:chr10        LN:133797422
@SQ     SN:chr11        LN:135086622
@SQ     SN:chr12        LN:133275309
@SQ     SN:chr13        LN:114364328
@SQ     SN:chr14        LN:107043718
@SQ     SN:chr15        LN:101991189
@SQ     SN:chr16        LN:90338345
@SQ     SN:chr17        LN:83257441
@SQ     SN:chr18        LN:80373285
@SQ     SN:chr19        LN:58617616
@SQ     SN:chr2 LN:242193529
@SQ     SN:chr20        LN:64444167
@SQ     SN:chr21        LN:46709983
@SQ     SN:chr22        LN:50818468
@SQ     SN:chr3 LN:198295559
@SQ     SN:chr4 LN:190214555
@SQ     SN:chr5 LN:181538259
@SQ     SN:chr6 LN:170805979
@SQ     SN:chr7 LN:159345973
@SQ     SN:chr8 LN:145138636
@SQ     SN:chr9 LN:138394717
@SQ     SN:chrMT        LN:16569
@SQ     SN:chrX LN:156040895

and the output of samtools view sublibrary1_chr_sorted.bam | head -n 10 is

63_76_14__R__159_76_14__ACGGACTC_AGATGTAC_AACCGAGA__TCCGGCTAAA__230914Xm_CAGATC 0       chr1    10002   255     108M42S                                                                                                                     *0       0       AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCACTAGATTCCGTCCACAGTCTCAAGCACGTGGATGTACAGCTA                                                                      FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFF::::F,:,FFFFFFF,:,,,F,FFFFF,,,,,,:,F,F,F,F                                                                                       NH:i:1   HI:i:1  AS:i:106        nM:i:0  GX:Z:   GN:Z:   pN:Z:TCCGGCTAAA CR:Z:ACGGACTC_AGATGTAC_AACCGAGA CB:Z:63_76_14__s1                                                                                                                   pB:Z:159_76_14   pS:Z:MRD016_D30 RE:A:N
30_91_44__T__30_91_44__ACTTTACC_CTAAGGTC_CTGAGCCA__ATCCAGAATG__230914Xm_CAGATC  16      chr1    10005   1       10S97M77N43M                                                                                                                *0       0       TAAGCCTATTCCTAACAGTATCAATATCACTAACCCGTACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCC                                                                      ::F,FFFF,,F,,F,,,,F,,F,,,,,,,F,F,FF,,,F,FF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFF                                                                                       NH:i:3   HI:i:1  AS:i:120        nM:i:9  GX:Z:   GN:Z:   pN:Z:ATCCAGAATG CR:Z:ACTTTACC_CTAAGGTC_CTGAGCCA CB:Z:30_91_44__s1                                                                                                                   pB:Z:30_91_44    pS:Z:MRD007_Transplant  RE:A:N
04_26_30__R__100_26_30__GCTTATAG_AGCAGGAA_CAACCACA__TATGAAGATT__230914Xm_CAGATC 16      chr1    10534   3       96M2D27M1S                                                                                                                  *0       0       AGTACCACCGAAATCTGTGCAGAGGAGAACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCAACTCCGCCGTCGCAAAGGCGCCGCGCCGGCGCAGGCGCAGAGAGC                                                                                                FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF                                                                                                                 NH:i:2   HI:i:1  AS:i:111        nM:i:2  GX:Z:   GN:Z:   pN:Z:TATGAAGATT CR:Z:GCTTATAG_AGCAGGAA_CAACCACA CB:Z:04_26_30__s1                                                                                                                   pB:Z:100_26_30   pS:Z:MRD002_D30 RE:A:N
04_26_30__R__100_26_30__GCTTATAG_AGCAGGAA_CAACCACA__TATGAAGATT__230914Xm_CAGATC 16      chr1    10534   3       96M2D27M1S                                                                                                                  *0       0       AGTACCACCGAAATCTGTGCAGAGGAGAACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCAACTCCGCCGTCGCAAAGGCGCCGCGCCGGCGCAGGCGCAGAGAGC                                                                                                FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF                                                                                                                 NH:i:2   HI:i:1  AS:i:111        nM:i:2  GX:Z:   GN:Z:   pN:Z:TATGAAGATT CR:Z:GCTTATAG_AGCAGGAA_CAACCACA CB:Z:04_26_30__s1                                                                                                                   pB:Z:100_26_30   pS:Z:MRD002_D30 RE:A:N
04_26_30__R__100_26_30__GCTTATAG_AGCAGGAA_CAACCACA__TATGAAGATT__230914Xm_CAGATC 16      chr1    10534   3       96M2D27M1S                                                                                                                  *0       0       AGTACCACCGAAATCTGTGCAGAGGAGAACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCAACTCCGCCGTCGCAAAGGCGCCGCGCCGGCGCAGGCGCAGAGAGC                                                                                                FF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF                                                                                                                 NH:i:2   HI:i:1  AS:i:111        nM:i:2  GX:Z:   GN:Z:   pN:Z:TATGAAGATT CR:Z:GCTTATAG_AGCAGGAA_CAACCACA CB:Z:04_26_30__s1                                                                                                                   pB:Z:100_26_30   pS:Z:MRD002_D30 RE:A:N
13_81_76__R__109_81_76__TATGTGTC_ATCATTCC_AGATGTAC__GCTTCATTTT__230914Xm_CAGATC 16      chr1    10535   3       95M2D25M                                                                                                                    *0       0       GTACCACCGAAATCTGTGCAGAGGAGAACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCAACTCCGCCGTCGCAAAGGCGCCGCGCCGGCGCAGGCGCAGAG                                                                                                    FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF                                                                                                                     NH:i:2   HI:i:1  AS:i:108        nM:i:2  GX:Z:   GN:Z:   pN:Z:GCTTCATTTT CR:Z:TATGTGTC_ATCATTCC_AGATGTAC CB:Z:13_81_76__s1                                                                                                                   pB:Z:109_81_76   pS:Z:MRD004_Transplant  RE:A:N
04_26_30__R__100_26_30__GCTTATAG_AGCAGGAA_CAACCACA__TATGAAGATT__230914Xm_CAGATC 16      chr1    10538   3       92M2D27M1S                                                                                                                  *0       0       CCACCGAAATCTGTGCAGAGGAGAACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCAACTCCGCCGTCGCAAAGGCGCCGCGCCGGCGCAGGCGCAGAGAGC                                                                                                    FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFF                                                                                                                     NH:i:2   HI:i:1  AS:i:107        nM:i:2  GX:Z:   GN:Z:   pN:Z:TATGAAGATT CR:Z:GCTTATAG_AGCAGGAA_CAACCACA CB:Z:04_26_30__s1                                                                                                                   pB:Z:100_26_30   pS:Z:MRD002_D30 RE:A:N
14_32_14__R__110_32_14__CAATTCTC_CAATGGAA_AACCGAGA__GAGGGGCGCG__230914Xm_CAGATC 16      chr1    10538   3       92M2D30M                                                                                                                    *0       0       CCACCGAAATCTGTGCAGAGGAGAACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCAACTCCGCCGTCGCAAAGGCGCCGCGCCGGCGCAGGCGCAGAGAGGCG                                                                                                  FFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF                                                                                                                   NH:i:2   HI:i:1  AS:i:110        nM:i:2  GX:Z:   GN:Z:   pN:Z:GAGGGGCGCG CR:Z:CAATTCTC_CAATGGAA_AACCGAGA CB:Z:14_32_14__s1                                                                                                                   pB:Z:110_32_14   pS:Z:MRD004_Transplant  RE:A:N
14_32_14__R__110_32_14__CAATTCTC_CAATGGAA_AACCGAGA__GAGGGGCGCG__230914Xm_CAGATC 16      chr1    10538   3       92M2D30M                                                                                                                    *0       0       CCACCGAAATCTGTGCAGAGGAGAACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCAACTCCGCCGTCGCAAAGGCGCCGCGCCGGCGCAGGCGCAGAGAGGCG                                                                                                  FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF                                                                                                                   NH:i:2   HI:i:1  AS:i:110        nM:i:2  GX:Z:   GN:Z:   pN:Z:GAGGGGCGCG CR:Z:CAATTCTC_CAATGGAA_AACCGAGA CB:Z:14_32_14__s1                                                                                                                   pB:Z:110_32_14   pS:Z:MRD004_Transplant  RE:A:N
14_32_14__R__110_32_14__CAATTCTC_CAATGGAA_AACCGAGA__GAGGGGCGCG__230914Xm_CAGATC 16      chr1    10540   3       90M2D30M                                                                                                                    *0       0       ACCGAAATCTGTGCAGAGGAGAACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCAACTCCGCCGTCGCAAAGGCGCCGCGCCGGCGCAGGCGCAGAGAGGCG                                                                                                    FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF                                                                                                                     NH:i:2   HI:i:1  AS:i:108        nM:i:2  GX:Z:   GN:Z:   pN:Z:GAGGGGCGCG CR:Z:CAATTCTC_CAATGGAA_AACCGAGA CB:Z:14_32_14__s1                                                                                                                   pB:Z:110_32_14   pS:Z:MRD004_Transplant  RE:A:N
rafaella-buzatu commented 10 months ago

Have you figured this out? I am having the same issue

jinzhuangdou commented 8 months ago

Could you examine your input bam files to see whether there is prefix chr? If it has (i.e., sublibrary1_chr_sorted.bam), it is weired that Monopogen re-add it based on the log information.