epifluidlab / FinaleToolkit

FinaleToolkit is a package and standalone program to extract fragmentation features of cell-free DNA from paired-end sequencing data.
https://epifluidlab.github.io/FinaleToolkit/
MIT License
14 stars 6 forks source link

Issues with latest version of NumPy and gap-bed tool not working #93

Closed paddyharker1005 closed 3 months ago

paddyharker1005 commented 3 months ago

Hi! Thanks very much for making this tool. Really nice to be able to extract these features without much hassle!

I've noticed a couple of issues with the latest version of the toolkit (v0.7.0):

  1. Whilst trying to run finaletoolkit delfi, I noticed the tool was silently failing at the # pool process to find window frag coverages, gc content multiprocessing step with calls to _delfi_single_window. After a bit of investigation, I realised this was coming from using an older version of np.NaN with the error

    ERROR:Error in multiprocessing pool: `np.NaN` was removed in the NumPy 2.0 release. Use `np.nan` instead.

    Modifying these to np.nan in line with NumPy 2.0 fixes the issue.

  2. I haven't been able to run finaletoolkit gap-bed as you mentioned in your most recent reply to #89. This is the error I get when running finaletoolkit gap-bed hg38 tcmeres.bed:

    Traceback (most recent call last):
    File "/home/pharker/.local/bin/finaletoolkit", line 8, in <module>
    sys.exit(main_cli())
             ^^^^^^^^^^
    File "/home/pharker/.local/lib/python3.12/site-packages/finaletoolkit/cli/main_cli.py", line 201, in main_cli
    function(**funcargs)
    File "/home/pharker/.local/lib/python3.12/site-packages/finaletoolkit/genome/gaps.py", line 482, in _cli_gap_bed
    ucsc_hg38_gap_bed(output_file)
    File "/home/pharker/.local/lib/python3.12/site-packages/finaletoolkit/genome/gaps.py", line 463, in ucsc_hg38_gap_bed
    raise NotImplementedError()
    NotImplementedError

And also finaletoolkit gap-bed hg19 tcmeres.bed:

Traceback (most recent call last):
  File "/home/pharker/.local/bin/finaletoolkit", line 8, in <module>
    sys.exit(main_cli())
             ^^^^^^^^^^
  File "/home/pharker/.local/lib/python3.12/site-packages/finaletoolkit/cli/main_cli.py", line 201, in main_cli
    function(**funcargs)
  File "/home/pharker/.local/lib/python3.12/site-packages/finaletoolkit/genome/gaps.py", line 477, in _cli_gap_bed
    ucsc_hg19_gap_bed(output_file)
  File "/home/pharker/.local/lib/python3.12/site-packages/finaletoolkit/genome/gaps.py", line 436, in ucsc_hg19_gap_bed
    return GenomeGaps.ucsc_hg19().to_bed(output_file)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pharker/.local/lib/python3.12/site-packages/finaletoolkit/genome/gaps.py", line 66, in ucsc_hg19
    gaps = np.genfromtxt(
           ^^^^^^^^^^^^^^
  File "/home/pharker/.local/lib/python3.12/site-packages/numpy/lib/_npyio_impl.py", line 1989, in genfromtxt
    fid = np.lib._datasource.open(fname, 'rt', encoding=encoding)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pharker/.local/lib/python3.12/site-packages/numpy/lib/_datasource.py", line 192, in open
    return ds.open(path, mode, encoding=encoding, newline=newline)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pharker/.local/lib/python3.12/site-packages/numpy/lib/_datasource.py", line 532, in open
    raise FileNotFoundError(f"{path} not found.")
FileNotFoundError: /home/pharker/.local/lib/python3.12/site-packages/finaletoolkit/genome/data/hg19.gap.txt.gz not found.
jamesli124 commented 3 months ago

Hi there! Thank you for taking a look at our tools. A new version, 0.7.1 has been released that addresses these issues.

  1. Thank you for taking the time to look into this bug. I changed all instances of np.NaN, and, out of an abundance of caution, I made FinaleToolkit require numpy<2.
  2. This issue was caused by certain files not being included with releases. A MANIFEST.in file is included in the latest version that ensures all files are packaged properly. finaletoolkit gap-bed should work properly now.