cortes-ciriano-lab / SComatic

A tool for detecting somatic variants in single cell data
Other
173 stars 28 forks source link

ValueError: invalid coordinates: start > stop #28

Closed amdqiao1 closed 1 year ago

amdqiao1 commented 1 year ago

Hi developers,

I got a pysam ValueError: invalid coordinates: start (248849282) > stop (138479) when running SitesPerCell.py analysis. Is it an expected behavior for 10x multiome data? I do have sequences aligned to the reverse strand (FLAG=16) in my BAM file. However, the start and stop don't seem to belong to the same entry for 1) it would be too long a sequence to be captured and 2) there is no match after running samtools view bamfile | grep 248849282\t138479. Can someone help me out on this? Are there any possible solutions? My environment is as following. Many thanks!!

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
_r-mutex                  1.0.1               anacondar_1    conda-forge
about-time                3.1.1                    pypi_0    pypi
bedtools                  2.31.0               hf5e1c6e_2    bioconda
binutils_impl_linux-64    2.36.1               h193b22a_2    conda-forge
binutils_linux-64         2.36                hf3e587d_33    conda-forge
bwidget                   1.9.14               ha770c72_1    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
ca-certificates           2023.05.30           h06a4308_0  
cairo                     1.16.0            h18b612c_1001    conda-forge
curl                      7.68.0               hf8cf82a_0    conda-forge
datamash                  1.1.0                         0    bioconda
expat                     2.5.0                h27087fc_0    conda-forge
fontconfig                2.14.1               hc2a2eb6_0    conda-forge
freetype                  2.12.1               hca18f0e_0    conda-forge
fribidi                   1.0.10               h36c2ea0_0    conda-forge
gcc_impl_linux-64         7.5.0               hda68d29_13    conda-forge
gcc_linux-64              7.5.0               h47867f9_33    conda-forge
gettext                   0.21.1               h27087fc_0    conda-forge
gfortran_impl_linux-64    7.5.0               h56cb351_20    conda-forge
gfortran_linux-64         7.5.0               h78c8a43_33    conda-forge
glib                      2.74.1               h6239696_1    conda-forge
glib-tools                2.74.1               h6239696_1    conda-forge
graphite2                 1.3.14               h295c915_1  
gsl                       2.5                  h294904e_1    conda-forge
gxx_impl_linux-64         7.5.0               h64c220c_13    conda-forge
gxx_linux-64              7.5.0               h555fc39_33    conda-forge
harfbuzz                  2.8.1                h6f93f22_0  
htslib                    1.10.2               h78d89cc_0    bioconda
icu                       58.2              hf484d3e_1000    conda-forge
jinja2                    3.1.2                    pypi_0    pypi
jpeg                      9e                   h166bdaf_2    conda-forge
kernel-headers_linux-64   2.6.32              he073ed8_15    conda-forge
krb5                      1.16.4               h2fd8d38_0    conda-forge
ld_impl_linux-64          2.36.1               hea4e1c9_2    conda-forge
lerc                      4.0.0                h27087fc_0    conda-forge
libblas                   3.9.0                8_openblas    conda-forge
libcblas                  3.9.0                8_openblas    conda-forge
libcurl                   7.68.0               hda55be3_0    conda-forge
libdeflate                1.14                 h166bdaf_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 12.2.0              h65d4601_19    conda-forge
libgfortran-ng            7.5.0               h14aa051_20    conda-forge
libgfortran4              7.5.0               h14aa051_20    conda-forge
libglib                   2.74.1               h606061b_1    conda-forge
libgomp                   12.2.0              h65d4601_19    conda-forge
libiconv                  1.17                 h166bdaf_0    conda-forge
liblapack                 3.9.0                8_openblas    conda-forge
libnsl                    2.0.0                h7f98852_0    conda-forge
libopenblas               0.3.12          pthreads_hb3c22a3_1    conda-forge
libpng                    1.6.38               h753d276_0    conda-forge
libssh2                   1.10.0               haa6b8db_3    conda-forge
libstdcxx-ng              12.2.0              h46fd767_19    conda-forge
libtiff                   4.4.0                h55922b4_4    conda-forge
libuuid                   2.32.1            h7f98852_1000    conda-forge
libwebp-base              1.2.4                h166bdaf_0    conda-forge
libxcb                    1.13              h7f98852_1004    conda-forge
libxml2                   2.9.9                h13577e0_2    conda-forge
libzlib                   1.2.13               h166bdaf_4    conda-forge
make                      4.3                  hd18ef5c_1    conda-forge
markupsafe                2.1.3                    pypi_0    pypi
ncurses                   6.2                  h58526e2_4    conda-forge
numpy                     1.21.6                   pypi_0    pypi
numpy-groupies            0.9.14                   pypi_0    pypi
openssl                   1.1.1v               h7f8727e_0  
pandas                    1.3.5                    pypi_0    pypi
pango                     1.40.14           he7ab937_1005    conda-forge
pcre                      8.45                 h9c3ff4c_0    conda-forge
pcre2                     10.40                hc3806b6_0    conda-forge
pip                       22.3.1             pyhd8ed1ab_0    conda-forge
pixman                    0.38.0            h516909a_1003    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
pybedtools                0.8.1                    pypi_0    pypi
pysam                     0.16.0.1                 pypi_0    pypi
python                    3.7.12          hb7a2778_100_cpython    conda-forge
python-dateutil           2.8.2                    pypi_0    pypi
pytz                      2023.3                   pypi_0    pypi
r-base                    3.6.1                h8900bf8_2    conda-forge
r-cpp11                   0.2.7             r36hc72bb7e_0    conda-forge
r-ragg                    0.4.0             r36h71971c5_0    conda-forge
r-systemfonts             1.0.2             r36hef9c87a_0    conda-forge
r-textshaping             0.1.2             r36ha19333d_0    conda-forge
r-vgam                    1.1_1             r36ha65eedd_0    r
readline                  8.1                  h46c0cb4_0    conda-forge
rpy2                      2.9.4                    pypi_0    pypi
samtools                  1.10                 h2e538c0_3    bioconda
scipy                     1.7.3                    pypi_0    pypi
sed                       4.8                  he412f7d_0    conda-forge
setuptools                65.5.1             pyhd8ed1ab_0    conda-forge
six                       1.16.0                   pypi_0    pypi
sqlite                    3.37.0               h9cd32fc_0    conda-forge
sysroot_linux-64          2.12                he073ed8_15    conda-forge
tk                        8.6.12               h27826a3_0    conda-forge
tktable                   2.10                 hb7b940f_3    conda-forge
wheel                     0.38.4             pyhd8ed1ab_0    conda-forge
xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
xorg-libice               1.0.10               h7f98852_0    conda-forge
xorg-libsm                1.2.3             hd9c2040_1000    conda-forge
xorg-libx11               1.7.2                h7f98852_0    conda-forge
xorg-libxau               1.0.9                h7f98852_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xorg-libxext              1.3.4                h7f98852_1    conda-forge
xorg-libxrender           0.9.10            h7f98852_1003    conda-forge
xorg-renderproto          0.11.1            h7f98852_1002    conda-forge
xorg-xextproto            7.3.0             h7f98852_1002    conda-forge
xorg-xproto               7.0.31            h7f98852_1007    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
zlib                      1.2.13               h166bdaf_4    conda-forge
zstd                      1.5.2                h6239696_4    conda-forge
Francesc-Muyas commented 1 year ago

Dear user, Thanks for bringing up this issue. It is the first time in my life that I have seen such an error. Could you please check a couple of things?

  1. Do you find the coordinates individually samtools view bamfile | grep 248849282 and samtools view bamfile | grep 138479
  2. When working with pysam, we usually work with 0-based coordinates. So could you look for the coordinates 248849283 or 138480 ?

Thanks, Fran

amdqiao1 commented 1 year ago
  1. Do you find the coordinates individually samtools view bamfile | grep 248849282 and samtools view bamfile | grep 138479

Yes! I found 248849282, but not 138479. 138479 only appeared as a substring of some greater coordinate.

VH00411:140:AAAYLMTHV:1:2406:24556:16903    1024    chr1    248849282   255 90M *   0   0   CCAGGGCAGATCAAGGGGCCTCTCAGAACCATGTTCCCCAGCCAGGTGAGGACCATTTTCACTGGGACCCAGGCCAAAACCATGTGGGTG  CCCCCCCC;CCCCCCCCCCCCCCCCCCC-CCCCC-CCCCCCCCCCCCCCCCCC-CCCCCCCCCCCCCCCCCCCCCCC;CCCCCCCCCCCC  NH:i:1  HI:i:1  AS:i:88 nM:i:0  RG:Z:results:0:1:AAAYLMTHV:1    TX:Z:ENST00000306562,+2809,90M  GX:Z:ENSG00000171161    GN:Z:ZNF672 fx:Z:ENSG00000171161    RE:A:E  xf:i:17 CR:Z:TTGCAAGGTCATGCCC   CY:Z:CCCCCCCCCC-CCCCC   CB:Z:TTGCAAGGTCATGCCC-1UR:Z:ATGCATTCTATA    UY:Z:CCCCCCCCCCCC   UB:Z:ATGCATTCTATA
VH00411:140:AAAYLMTHV:2:1507:10562:34983    1024    chr1    248849282   255 90M *   0   0   CCAGGGCAGATCAAGGGGCCTCTCAGAACCATGTTCCCCAGCCAGGTGAGGACCATTTTCACTGGGACCCAGGCCAAAACCATGTGGGTG  CCCCCCCCCCCCCCCCCCCCCCCCCC;CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC  NH:i:1  HI:i:1  AS:i:88 nM:i:0  RG:Z:results:0:1:AAAYLMTHV:2    TX:Z:ENST00000306562,+2809,90M  GX:Z:ENSG00000171161    GN:Z:ZNF672 fx:Z:ENSG00000171161    RE:A:E  xf:i:17 CR:Z:GTAAAGCCAATTTGGT   CY:Z:CCCCCCCCCCCCCCCC   CB:Z:GTAAAGCCAATTTGGT-1UR:Z:TGCAAATAAACA    UY:Z:CCCCCCCCC;CC   UB:Z:TGCAAATAAACA
  1. When working with pysam, we usually work with 0-based coordinates. So could you look for the coordinates 248849283 or 138480?

Yes for 248849283, but not 138480. Similarly, 138480 only appeared as a substring.

VH00411:140:AAAYLMTHV:1:1603:33058:30666    1024    chr1    248849283   255 90M *   0   0   CAGGGCAGATCAAGGGGCCTCTCAGAACCATGTTCCCCAGCCAGGTGATGACCATTTTCACTGGGACCCAGGCCAAAACCATGTGGGTGC  CCCCCCCCCCCCCCCCCCCCCCCCC;CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC-CCCCCCCCCCCCCCCCCCCC  NH:i:1  HI:i:1  AS:i:86 nM:i:1  RG:Z:results:0:1:AAAYLMTHV:1    TX:Z:ENST00000306562,+2810,90M  GX:Z:ENSG00000171161    GN:Z:ZNF672 fx:Z:ENSG00000171161    RE:A:E  xf:i:17 CR:Z:ATGTAACGTCGTTATC   CY:Z:;CCCC;CCCCCCC;CC   CB:Z:ATGTAACGTCGTTATC-1UR:Z:ACCACTTTTTTG    UY:Z:CCCCCCCCCCCC   UB:Z:ACCACTTTTTTG
Francesc-Muyas commented 1 year ago

Could you please provide some more info about the next points?

  1. Could you send the command you used to run SitesPerCell.py? As well as the complete error that you get.
  2. Could you check if the problematic coordinates are found in the input file you passed to the parameter --infile ?
amdqiao1 commented 1 year ago
  1. Could you send the command you used to run SitesPerCell.py? As well as the complete error that you get.

The command is as suggested in the tutorial:

python $SCOMATIC/scripts/SitesPerCell/SitesPerCell.py --bam $bam \
       --infile $infile/${sample}.calling.step1.tsv \
       --ref $ref/genome.fa \
       --out_folder $outfile --tmp_dir $temp --nprocs 1

The complete error:

Traceback (most recent call last):
  File "/PHShome/yq038/SComatic-main/scripts/SitesPerCell/SitesPerCell.py", line 369, in <module>
    main()
  File "/PHShome/yq038/SComatic-main/scripts/SitesPerCell/SitesPerCell.py", line 358, in main
    collect_result(run_interval(row, DICT_sites[row], BAM, FASTA, MIN_COV, MIN_CC, tmp_dir, MIN_BQ, MIN_MQ))
  File "/PHShome/yq038/SComatic-main/scripts/SitesPerCell/SitesPerCell.py", line 131, in run_interval
    i = bam.pileup(CHROM, START, END, min_base_quality = BQ, min_mapping_quality = MQ, ignore_overlaps = False)
  File "pysam/libcalignmentfile.pyx", line 1326, in pysam.libcalignmentfile.AlignmentFile.pileup
  File "pysam/libchtslib.pyx", line 688, in pysam.libchtslib.HTSFile.parse_region
ValueError: invalid coordinates: start (248849282) > stop (138479)
Traceback (most recent call last):
  File "/PHShome/yq038/SComatic-main/scripts/SitesPerCell/SitesPerCell.py", line 369, in <module>
    main()
  File "/PHShome/yq038/SComatic-main/scripts/SitesPerCell/SitesPerCell.py", line 358, in main
    collect_result(run_interval(row, DICT_sites[row], BAM, FASTA, MIN_COV, MIN_CC, tmp_dir, MIN_BQ, MIN_MQ))
  File "/PHShome/yq038/SComatic-main/scripts/SitesPerCell/SitesPerCell.py", line 131, in run_interval
    i = bam.pileup(CHROM, START, END, min_base_quality = BQ, min_mapping_quality = MQ, ignore_overlaps = False)
  File "pysam/libcalignmentfile.pyx", line 1326, in pysam.libcalignmentfile.AlignmentFile.pileup
  File "pysam/libchtslib.pyx", line 688, in pysam.libchtslib.HTSFile.parse_region
ValueError: invalid coordinates: start (248849282) > stop (138479)
Traceback (most recent call last):
  File "/PHShome/yq038/SComatic-main/scripts/SitesPerCell/SitesPerCell.py", line 369, in <module>
    main()
  File "/PHShome/yq038/SComatic-main/scripts/SitesPerCell/SitesPerCell.py", line 358, in main
    collect_result(run_interval(row, DICT_sites[row], BAM, FASTA, MIN_COV, MIN_CC, tmp_dir, MIN_BQ, MIN_MQ))
  File "/PHShome/yq038/SComatic-main/scripts/SitesPerCell/SitesPerCell.py", line 131, in run_interval
    i = bam.pileup(CHROM, START, END, min_base_quality = BQ, min_mapping_quality = MQ, ignore_overlaps = False)
  File "pysam/libcalignmentfile.pyx", line 1326, in pysam.libcalignmentfile.AlignmentFile.pileup
  File "pysam/libchtslib.pyx", line 688, in pysam.libchtslib.HTSFile.parse_region
ValueError: invalid coordinates: start (248849282) > stop (138479)
Traceback (most recent call last):
  File "/PHShome/yq038/SComatic-main/scripts/SitesPerCell/SitesPerCell.py", line 369, in <module>
    main()
  File "/PHShome/yq038/SComatic-main/scripts/SitesPerCell/SitesPerCell.py", line 358, in main
    collect_result(run_interval(row, DICT_sites[row], BAM, FASTA, MIN_COV, MIN_CC, tmp_dir, MIN_BQ, MIN_MQ))
  File "/PHShome/yq038/SComatic-main/scripts/SitesPerCell/SitesPerCell.py", line 131, in run_interval
    i = bam.pileup(CHROM, START, END, min_base_quality = BQ, min_mapping_quality = MQ, ignore_overlaps = False)
  File "pysam/libcalignmentfile.pyx", line 1326, in pysam.libcalignmentfile.AlignmentFile.pileup
  File "pysam/libchtslib.pyx", line 688, in pysam.libchtslib.HTSFile.parse_region
ValueError: invalid coordinates: start (248849282) > stop (138479)
Traceback (most recent call last):
  File "/PHShome/yq038/SComatic-main/scripts/SitesPerCell/SitesPerCell.py", line 369, in <module>
    main()
  File "/PHShome/yq038/SComatic-main/scripts/SitesPerCell/SitesPerCell.py", line 358, in main
    collect_result(run_interval(row, DICT_sites[row], BAM, FASTA, MIN_COV, MIN_CC, tmp_dir, MIN_BQ, MIN_MQ))
  File "/PHShome/yq038/SComatic-main/scripts/SitesPerCell/SitesPerCell.py", line 131, in run_interval
    i = bam.pileup(CHROM, START, END, min_base_quality = BQ, min_mapping_quality = MQ, ignore_overlaps = False)
  File "pysam/libcalignmentfile.pyx", line 1326, in pysam.libcalignmentfile.AlignmentFile.pileup
  File "pysam/libchtslib.pyx", line 688, in pysam.libchtslib.HTSFile.parse_region
ValueError: invalid coordinates: start (248849282) > stop (138479)
Traceback (most recent call last):
  File "/PHShome/yq038/SComatic-main/scripts/SitesPerCell/SitesPerCell.py", line 369, in <module>
    main()
  File "/PHShome/yq038/SComatic-main/scripts/SitesPerCell/SitesPerCell.py", line 358, in main
    collect_result(run_interval(row, DICT_sites[row], BAM, FASTA, MIN_COV, MIN_CC, tmp_dir, MIN_BQ, MIN_MQ))
  File "/PHShome/yq038/SComatic-main/scripts/SitesPerCell/SitesPerCell.py", line 131, in run_interval
    i = bam.pileup(CHROM, START, END, min_base_quality = BQ, min_mapping_quality = MQ, ignore_overlaps = False)
  File "pysam/libcalignmentfile.pyx", line 1326, in pysam.libcalignmentfile.AlignmentFile.pileup
  File "pysam/libchtslib.pyx", line 688, in pysam.libchtslib.HTSFile.parse_region
ValueError: invalid coordinates: start (248849282) > stop (138479)
Traceback (most recent call last):
  File "/PHShome/yq038/SComatic-main/scripts/SitesPerCell/SitesPerCell.py", line 369, in <module>
    main()
  File "/PHShome/yq038/SComatic-main/scripts/SitesPerCell/SitesPerCell.py", line 358, in main
    collect_result(run_interval(row, DICT_sites[row], BAM, FASTA, MIN_COV, MIN_CC, tmp_dir, MIN_BQ, MIN_MQ))
  File "/PHShome/yq038/SComatic-main/scripts/SitesPerCell/SitesPerCell.py", line 131, in run_interval
    i = bam.pileup(CHROM, START, END, min_base_quality = BQ, min_mapping_quality = MQ, ignore_overlaps = False)
  File "pysam/libcalignmentfile.pyx", line 1326, in pysam.libcalignmentfile.AlignmentFile.pileup
  File "pysam/libchtslib.pyx", line 688, in pysam.libchtslib.HTSFile.parse_region
ValueError: invalid coordinates: start (248849282) > stop (138479)
Traceback (most recent call last):
  File "/PHShome/yq038/SComatic-main/scripts/SitesPerCell/SitesPerCell.py", line 369, in <module>
    main()
  File "/PHShome/yq038/SComatic-main/scripts/SitesPerCell/SitesPerCell.py", line 358, in main
    collect_result(run_interval(row, DICT_sites[row], BAM, FASTA, MIN_COV, MIN_CC, tmp_dir, MIN_BQ, MIN_MQ))
  File "/PHShome/yq038/SComatic-main/scripts/SitesPerCell/SitesPerCell.py", line 131, in run_interval
    i = bam.pileup(CHROM, START, END, min_base_quality = BQ, min_mapping_quality = MQ, ignore_overlaps = False)
  File "pysam/libcalignmentfile.pyx", line 1326, in pysam.libcalignmentfile.AlignmentFile.pileup
  File "pysam/libchtslib.pyx", line 688, in pysam.libchtslib.HTSFile.parse_region
ValueError: invalid coordinates: start (248849282) > stop (138479)
Traceback (most recent call last):
  File "/PHShome/yq038/SComatic-main/scripts/SitesPerCell/SitesPerCell.py", line 369, in <module>
    main()
  File "/PHShome/yq038/SComatic-main/scripts/SitesPerCell/SitesPerCell.py", line 358, in main
    collect_result(run_interval(row, DICT_sites[row], BAM, FASTA, MIN_COV, MIN_CC, tmp_dir, MIN_BQ, MIN_MQ))
  File "/PHShome/yq038/SComatic-main/scripts/SitesPerCell/SitesPerCell.py", line 131, in run_interval
    i = bam.pileup(CHROM, START, END, min_base_quality = BQ, min_mapping_quality = MQ, ignore_overlaps = False)
  File "pysam/libcalignmentfile.pyx", line 1326, in pysam.libcalignmentfile.AlignmentFile.pileup
  File "pysam/libchtslib.pyx", line 688, in pysam.libchtslib.HTSFile.parse_region
ValueError: invalid coordinates: start (248849282) > stop (138479)
Traceback (most recent call last):
  File "/PHShome/yq038/SComatic-main/scripts/SitesPerCell/SitesPerCell.py", line 369, in <module>
    main()
  File "/PHShome/yq038/SComatic-main/scripts/SitesPerCell/SitesPerCell.py", line 358, in main
    collect_result(run_interval(row, DICT_sites[row], BAM, FASTA, MIN_COV, MIN_CC, tmp_dir, MIN_BQ, MIN_MQ))
  File "/PHShome/yq038/SComatic-main/scripts/SitesPerCell/SitesPerCell.py", line 131, in run_interval
    i = bam.pileup(CHROM, START, END, min_base_quality = BQ, min_mapping_quality = MQ, ignore_overlaps = False)
  File "pysam/libcalignmentfile.pyx", line 1326, in pysam.libcalignmentfile.AlignmentFile.pileup
  File "pysam/libchtslib.pyx", line 688, in pysam.libchtslib.HTSFile.parse_region
ValueError: invalid coordinates: start (248849282) > stop (138479)
Traceback (most recent call last):
  File "/PHShome/yq038/SComatic-main/scripts/SitesPerCell/SitesPerCell.py", line 369, in <module>
    main()
  File "/PHShome/yq038/SComatic-main/scripts/SitesPerCell/SitesPerCell.py", line 358, in main
    collect_result(run_interval(row, DICT_sites[row], BAM, FASTA, MIN_COV, MIN_CC, tmp_dir, MIN_BQ, MIN_MQ))
  File "/PHShome/yq038/SComatic-main/scripts/SitesPerCell/SitesPerCell.py", line 131, in run_interval
    i = bam.pileup(CHROM, START, END, min_base_quality = BQ, min_mapping_quality = MQ, ignore_overlaps = False)
  File "pysam/libcalignmentfile.pyx", line 1326, in pysam.libcalignmentfile.AlignmentFile.pileup
  File "pysam/libchtslib.pyx", line 688, in pysam.libchtslib.HTSFile.parse_region
ValueError: invalid coordinates: start (248849282) > stop (138479)
Traceback (most recent call last):
  File "/PHShome/yq038/SComatic-main/scripts/SitesPerCell/SitesPerCell.py", line 369, in <module>
    main()
  File "/PHShome/yq038/SComatic-main/scripts/SitesPerCell/SitesPerCell.py", line 358, in main
    collect_result(run_interval(row, DICT_sites[row], BAM, FASTA, MIN_COV, MIN_CC, tmp_dir, MIN_BQ, MIN_MQ))
  File "/PHShome/yq038/SComatic-main/scripts/SitesPerCell/SitesPerCell.py", line 131, in run_interval
    i = bam.pileup(CHROM, START, END, min_base_quality = BQ, min_mapping_quality = MQ, ignore_overlaps = False)
  File "pysam/libcalignmentfile.pyx", line 1326, in pysam.libcalignmentfile.AlignmentFile.pileup
  File "pysam/libchtslib.pyx", line 688, in pysam.libchtslib.HTSFile.parse_region
ValueError: invalid coordinates: start (248849282) > stop (138479)
Traceback (most recent call last):
  File "/PHShome/yq038/SComatic-main/scripts/SitesPerCell/SitesPerCell.py", line 369, in <module>
    main()
  File "/PHShome/yq038/SComatic-main/scripts/SitesPerCell/SitesPerCell.py", line 358, in main
    collect_result(run_interval(row, DICT_sites[row], BAM, FASTA, MIN_COV, MIN_CC, tmp_dir, MIN_BQ, MIN_MQ))
  File "/PHShome/yq038/SComatic-main/scripts/SitesPerCell/SitesPerCell.py", line 131, in run_interval
    i = bam.pileup(CHROM, START, END, min_base_quality = BQ, min_mapping_quality = MQ, ignore_overlaps = False)
  File "pysam/libcalignmentfile.pyx", line 1326, in pysam.libcalignmentfile.AlignmentFile.pileup
  File "pysam/libchtslib.pyx", line 688, in pysam.libchtslib.HTSFile.parse_region
ValueError: invalid coordinates: start (248849282) > stop (138479)
Traceback (most recent call last):
  File "/PHShome/yq038/SComatic-main/scripts/SitesPerCell/SitesPerCell.py", line 369, in <module>
    main()
  File "/PHShome/yq038/SComatic-main/scripts/SitesPerCell/SitesPerCell.py", line 358, in main
    collect_result(run_interval(row, DICT_sites[row], BAM, FASTA, MIN_COV, MIN_CC, tmp_dir, MIN_BQ, MIN_MQ))
  File "/PHShome/yq038/SComatic-main/scripts/SitesPerCell/SitesPerCell.py", line 131, in run_interval
    i = bam.pileup(CHROM, START, END, min_base_quality = BQ, min_mapping_quality = MQ, ignore_overlaps = False)
  File "pysam/libcalignmentfile.pyx", line 1326, in pysam.libcalignmentfile.AlignmentFile.pileup
  File "pysam/libchtslib.pyx", line 688, in pysam.libchtslib.HTSFile.parse_region
ValueError: invalid coordinates: start (248849282) > stop (138479)
  1. Could you check if the problematic coordinates are found in the input file you passed to the parameter --infile ?

Here are the matching results for problematic start and stop coordinates in ${sample}.calling.step1.tsv. There are other matches of 138479 as substrings in this file.

chr1    248849282   248849282   C   .   .   .   TGTCC   CAGGG   .   .   .   0;39;1  0;29;1  .   .   DP|NC|CC|BC|BQ|BCf|BCr  NA  39|29|0:29:0:0:0:0|0:39:0:0:0:0|0:1326:0:0:0:0|0:28:0:0:0:0|0:11:0:0:0:0    NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
KI270733.1  138479  138479  C   .   .   .   CCTCC   TCCCT   .   .   .   .   0;7;1   0;5;1   .   .   DP|NC|CC|BC|BQ|BCf|BCr  NA  7|5|0:5:0:0:0:0|0:7:0:0:0:0|0:238:0:0:0:0|0:1:0:0:0:0|0:6:0:0:0:0   NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
RinconFer commented 1 year ago

I have also faced this issue, strangely enough, I have been able to avoid this by increasing the number of cell types in my metadata.

In my first approach I had only 3 groups, immune, stroma and tumor cells, now I have subdivided the barcodes metadata to the actual cell types in the immune and stroma population as well as subdivide the tumor cells by custom-defined transcriptional programs and the script seems to work perfectly fine. In some of the cell types (extremely small number of cells) it doesn't give results (no barcode or sites) but I suppose I can work without those cell types.

Francesc-Muyas commented 1 year ago

It's interesting that it only fails with a low number of cell types. Actually, it is a complaint from the pysam package. I will take a deeper look at this error.

Very low-represented cell types are expected not to provide results with SComatic. That occurs cause they need to reach the minimum required coverage per site to be considered as callable (5 cells per site by default).

Thanks, Fran