ctmrbio / stag-mwc

StaG Metagenomic Workflow Collaboration
https://stag-mwc.readthedocs.org
MIT License
28 stars 13 forks source link

preprocessing_summary error #163

Closed AroArz closed 3 years ago

AroArz commented 3 years ago

branch

158-biobakery-update was used, however these sections were unmodified so the problem is likely present in master branch as well

config

#########################
# Pipeline steps included
#########################
qc_reads: True
host_removal: True
naive:
    assess_depth: False
    sketch_compare: False
taxonomic_profile:
    kaiju: False
    kraken2: False
    metaphlan: True
strain_level_profiling:
    strainphlan: False        # Will also run metaphlan. Please make sure you've added bt2_db_dir and bt2_index under metaphlan settings.
functional_profile:
    humann: True
antibiotic_resistance:
    groot: False
    amrplusplus: False
mappers:
    bbmap: False
    bowtie2: False
assembly: False
binning: False
multiqc_report: True

shell

Traceback (most recent call last):
  File "scripts/preprocessing_summary.py", line 84, in <module>
    .plot(kind="line", style=".-", ax=ax)
  File "/crex/proj/snic2020-6-233/projects/02_bifido/evolve/.snakemake/conda/c113d7bf/lib/python3.7/site-packages/pandas/plotting/_core.py", line 794, in __call__
    return plot_backend.plot(data, kind=kind, **kwargs)
  File "/crex/proj/snic2020-6-233/projects/02_bifido/evolve/.snakemake/conda/c113d7bf/lib/python3.7/site-packages/pandas/plotting/_matplotlib/__init__.py", line 62, in plot
    plot_obj.generate()
  File "/crex/proj/snic2020-6-233/projects/02_bifido/evolve/.snakemake/conda/c113d7bf/lib/python3.7/site-packages/pandas/plotting/_matplotlib/core.py", line 279, in generate
    self._compute_plot_data()
  File "/crex/proj/snic2020-6-233/projects/02_bifido/evolve/.snakemake/conda/c113d7bf/lib/python3.7/site-packages/pandas/plotting/_matplotlib/core.py", line 404, in _compute_plot_data
    include=[np.number, "datetime", "datetimetz", "timedelta"]
  File "/crex/proj/snic2020-6-233/projects/02_bifido/evolve/.snakemake/conda/c113d7bf/lib/python3.7/site-packages/pandas/core/frame.py", line 3427, in select_dtypes
    include_these = Series(not bool(include), index=self.columns)
  File "/crex/proj/snic2020-6-233/projects/02_bifido/evolve/.snakemake/conda/c113d7bf/lib/python3.7/site-packages/pandas/core/series.py", line 311, in __init__
    data = sanitize_array(data, index, dtype, copy, raise_cast_failure=True)
  File "/crex/proj/snic2020-6-233/projects/02_bifido/evolve/.snakemake/conda/c113d7bf/lib/python3.7/site-packages/pandas/core/internals/construction.py", line 712, in sanitize_array
    subarr = construct_1d_arraylike_from_scalar(value, len(index), dtype)
  File "/crex/proj/snic2020-6-233/projects/02_bifido/evolve/.snakemake/conda/c113d7bf/lib/python3.7/site-packages/pandas/core/dtypes/cast.py", line 1233, in construct_1d_arraylike_from_scalar
    subarr = np.empty(length, dtype=dtype)
TypeError: Cannot interpret '<attribute 'dtype' of 'numpy.generic' objects>' as a data type

rules

Aforementioned error was produced by preprocessing_summary, a similar error from plot_proportion_host

log

log files were empty

command line

Executed with both --use-conda and --use-singularity. Correct image was pulled.

fix

Updated pandas to 1.2.1

AroArz commented 3 years ago

Related to this topic as well I noticed that scripts/processing_summary.py defines after_host_removal by number of classified reads in the kraken2 log file. Unless I'm mistaken it should be defining it by Unclassified?

 def parse_kraken2_logs(logfiles):
    for logfile in logfiles:
        with open(logfile) as f:
            sample_name = Path(logfile).stem.split(".")[0]
            for line in f:
                if " classified" in line:
                    yield {"Sample": sample_name, "after_host_removal": int(line.strip().split()[0])}

example of log file

Loading database information... done.
50376689 sequences (12703.60 Mbp) processed in 371.331s (8139.9 Kseq/m, 2052.66 Mbp/m).
  21664417 sequences classified (43.00%)
  28712272 sequences unclassified (57.00%)
output_dir/host_removal/sample_host_1.fq to output_dir/host_removal/sample_host_1.fq.gz 
output_dir/host_removal/sample_host_2.fq to output_dir/host_removal/sample_host_2.fq.gz 
output_dir/host_removal/sample_1.fq to output_dir/host_removal/sample_1.fq.gz 
output_dir/host_removal/sample_2.fq to output_dir/host_removal/sample_2.fq.gz 
boulund commented 3 years ago

fix

Updated pandas to 1.2.1

So the issue first reported in this thread was solved entirely by updating pandas? Odd!

boulund commented 3 years ago

Is this issue resolved by your update of pandas now then? If so, please close this issue @AroArz