Microbial-Ecology-Group / AMRplusplus

AMR++ is a bioinformatic pipeline meant to aid in the analysis of raw sequencing reads to characterize the profile of antimicrobial resistance genes, or resistome.
https://www.meglab.org/
GNU General Public License v3.0
25 stars 8 forks source link

Latest NumPy version breaks kraken2_long_to_wide.py #28

Closed passdan closed 10 months ago

passdan commented 11 months ago

kraken2_long_to_wide.py uses an outdated alias np.float (now just float()) on lines 56 & 61. Will do a pull in the future when got my environment fixed, or corrected code here:

def dict_to_matrix(D):
    ncol = len(D.keys())
    unique_nodes = []
    samples = []
    for sample, tdict in D.items():
        for taxon in tdict.keys():
            if taxon not in unique_nodes:
                unique_nodes.append(taxon)
    nrow = len(unique_nodes)
    return_values = np.zeros((nrow, ncol), dtype=float)   ####
    for j, (sample, tdict) in enumerate(D.items()):
        samples.append(sample)
        for i, taxon in enumerate(unique_nodes):
            if taxon in tdict:
                return_values[i, j] = float(tdict[taxon]) ##########
    return return_values, unique_nodes, samples

Full error:

Error executing process > 'FASTQ_KRAKEN_WF:krakenresults (null)'

Caused by:
  Process `FASTQ_KRAKEN_WF:krakenresults (null)` terminated with an error exit status (1)

Command executed:

  python3 /mnt/data/GROUP-smbpk/sbidp3/AMRplusplus/bin/kraken2_long_to_wide.py -i ABW_May2023_S26_R.kraken.report -o kraken_analytic_matrix.csv

Command exit status:
  1

Command output:
  (empty)

Command error:
  Traceback (most recent call last):
    File "/mnt/data/GROUP-smbpk/sbidp3/AMRplusplus/bin/kraken2_long_to_wide.py", line 134, in <module>
      kraken2_load_analytic_data(opts.input_files)
    File "/mnt/data/GROUP-smbpk/sbidp3/AMRplusplus/bin/kraken2_long_to_wide.py", line 112, in kraken2_load_analytic_data
      return dict_to_matrix(return_values), unclassifieds
    File "/mnt/data/GROUP-smbpk/sbidp3/AMRplusplus/bin/kraken2_long_to_wide.py", line 56, in dict_to_matrix
      return_values = np.zeros((nrow, ncol), dtype=np.float)
    File "/opt/conda/lib/python3.9/site-packages/numpy/__init__.py", line 313, in __getattr__
      raise AttributeError(__former_attrs__[attr])
  AttributeError: module 'numpy' has no attribute 'float'.
  `np.float` was a deprecated alias for the builtin `float`. To avoid this error in existing code, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
  The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
      https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
passdan commented 10 months ago

Coming back to this, actually it seems that updating the alias doesn't fix the issue I was investigating. But actually this come because although the .yaml files specify numpy version number, the Singularity and Docker files don't. Selecting -profile singularity is broken for Kraken as it uses different versions of various tools to the conda install.

This is the conda install line inside the Singularity and Docker files, although they require local build. Updating in your docker repo would fix for all users

    # install bulk of bioinformatic tools using conda
    conda create -n AmrPlusPlus_env python=3.9 trimmomatic=0.39 bwa samtools=1.15.1 bedtools kraken=2.1.2 biopython matplotlib=3.5.3 numpy=1.23.1 pysam=0.19.1 pandas=1.4.3 fastqc=0.11.8 multiqc

Thanks!

passdan commented 10 months ago

For anyone else coming across these issues in the temporary future, I made a new docker with the only difference that the version numbers are hard-coded, which works for running amr++ : passdan/amrplusplus-update