epifluidlab / FinaleToolkit

FinaleToolkit is a package and standalone program to extract fragmentation features of cell-free DNA from paired-end sequencing data.
https://epifluidlab.github.io/FinaleToolkit/
MIT License
14 stars 6 forks source link

wps: error in processing the input BED files #107

Open raunakms opened 2 weeks ago

raunakms commented 2 weeks ago

Thanks for creating this awesome tool. I have encountered a potential bug in the WPS score calculation tool. It fails to process specific genomic intervals from the input BED file. There appears to be an issue with the WPS score calculation to handle multiple genomic intervals within a BED file. However, the tool functions correctly when processing BED files with a single genomic interval that were split from the original multi-region BED file. I am using the latest available version of the tool: FinaleToolkit 0.7.5

Here are the input files Input file download url: BAM: https://figshare.com/ndownloader/files/50203797 BED: https://figshare.com/ndownloader/files/50203812

This is the content of the input BED file

$ cat type_a_12.bed
chr22   17627819        17631818        ENSG00000131100.12      4000    -
chr22   17625855        17629854        ENSG00000099968.17      4000    +

The command I used

file_input_bam=GSM1833238_chr22.bam
file_interval_bed=type_a_12.bed
file_output_bigwig=output_wps.bw

finaletoolkit wps \
        ${file_input_bam} \
        ${file_interval_bed} \
        --output_file ${file_output_bigwig} \
        --interval_size 4000 \
        --window_size 100 \
        --fraction_low 50 \
        --fraction_high 1000 \
        --quality_threshold 30 \
        --workers 10 \
        --verbose

The error message I received:

            Calculating aggregate WPS
            input_file: GSM1833238_chr22.bam
            site_bed: type_a_12.bed
            output_file: output_wps.bw
            window_size: 100
            interval_size: 4000
            quality_threshold: 30
            workers: 10
            verbose: 1

            Reading intervals from bed
Zipping inputs
Calculating wps...
Output file output_wps.bw specified. Opening...
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/usr/local/lib/python3.10/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/usr/local/lib/python3.10/site-packages/finaletoolkit/frag/multi_wps.py", line 17, in _wps_star
    return wps(*args)
  File "/usr/local/lib/python3.10/site-packages/finaletoolkit/frag/wps.py", line 122, in wps
    window_starts = np.zeros(stop-start)
ValueError: negative dimensions are not allowed
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/finaletoolkit", line 8, in <module>
    sys.exit(main_cli())
  File "/usr/local/lib/python3.10/site-packages/finaletoolkit/cli/main_cli.py", line 793, in main_cli
    function(**funcargs)
  File "/usr/local/lib/python3.10/site-packages/finaletoolkit/frag/multi_wps.py", line 199, in multi_wps
    for interval_score in interval_scores:
  File "/usr/local/lib/python3.10/multiprocessing/pool.py", line 423, in <genexpr>
    return (item for chunk in result for item in chunk)
  File "/usr/local/lib/python3.10/multiprocessing/pool.py", line 873, in next
    raise value
ValueError: negative dimensions are not allowed

Additionally, I encountered another error message of similar nature using the following genomic regions BED: https://figshare.com/ndownloader/files/50203821

$ cat type_b_12.bed
chr22   46573012        46577011        ENSG00000075240.16      4000    +
chr22   39396741        39400740        ENSG00000100324.13      4000    +

The error message I received:

            Calculating aggregate WPS
            input_file: GSM1833238_chr22.bam
            site_bed: type_b_12.bed
            output_file: output_wps.bw
            window_size: 100
            interval_size: 4000
            quality_threshold: 30
            workers: 10
            verbose: 1

            Reading intervals from bed
Zipping inputs
Calculating wps...
Output file output_wps.bw specified. Opening...
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/multiprocessing/pool.py", line 125, in worker
  File "/usr/local/lib/python3.10/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/usr/local/lib/python3.10/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/usr/local/lib/python3.10/site-packages/finaletoolkit/frag/multi_wps.py", line 17, in _wps_star
    return wps(*args)
  File "/usr/local/lib/python3.10/site-packages/finaletoolkit/frag/wps.py", line 95, in wps
    frag_ends = frag_array(input_file,
  File "/usr/local/lib/python3.10/site-packages/finaletoolkit/utils/utils.py", line 414, in frag_array
    frag_list = [
  File "/usr/local/lib/python3.10/site-packages/finaletoolkit/utils/utils.py", line 414, in <listcomp>
    frag_list = [
  File "/usr/local/lib/python3.10/site-packages/finaletoolkit/utils/utils.py", line 296, in frag_generator
    for read in sam_file.fetch(contig, start, stop):
  File "pysam/libcalignmentfile.pyx", line 1091, in pysam.libcalignmentfile.AlignmentFile.fetch
  File "pysam/libchtslib.pyx", line 688, in pysam.libchtslib.HTSFile.parse_region
ValueError: invalid coordinates: start (46572011) > stop (39397740)
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/finaletoolkit", line 8, in <module>
    sys.exit(main_cli())
  File "/usr/local/lib/python3.10/site-packages/finaletoolkit/cli/main_cli.py", line 793, in main_cli
    function(**funcargs)
  File "/usr/local/lib/python3.10/site-packages/finaletoolkit/frag/multi_wps.py", line 199, in multi_wps
    for interval_score in interval_scores:
  File "/usr/local/lib/python3.10/multiprocessing/pool.py", line 423, in <genexpr>
    return (item for chunk in result for item in chunk)
  File "/usr/local/lib/python3.10/multiprocessing/pool.py", line 873, in next
    raise value
ValueError: invalid coordinates: start (46572011) > stop (39397740)

Could you please have a look at this issue. Thank you.

jamesli124 commented 1 week ago

Hi,

Thank you for bringing this to our attention. The error trace and included files are incredibly helpful and greatly appreciated. I will work with the other authors to find a solution.

Best, James