a-slide / pycoQC

pycoQC computes metrics and generates Interactive QC plots from the sequencing summary report generated by Oxford Nanopore technologies basecaller (Albacore/Guppy)
https://a-slide.github.io/pycoQC/
GNU General Public License v3.0
271 stars 41 forks source link

bam test data #141

Open bernt-matthias opened 1 year ago

bernt-matthias commented 1 year ago

Hi

I'm trying to fix the galaxy tool for pycoQC for BAM input (https://github.com/galaxyproject/tools-iuc/pull/5201). Wondering if you have input data that is supposed to work?

With the input data I'm using I get:

Job in error state.. tool_id: pycoqc, exit_code: 1, stderr: Checking arguments values
Check input data files
Parse data files
Merge data
Cleaning data
    Discarding lines containing NA values
        0 reads discarded
    Filtering out zero length reads
        0 reads discarded
    Sorting run IDs by decreasing throughput
        Run-id order ['2bf3a5a5424e9267975cff54d2d8d1731fde919f']
    Reordering runids
        Processing reads with Run_ID 2bf3a5a5424e9267975cff54d2d8d1731fde919f / time offset: 0
    Cast value to appropriate type
    Reindexing dataframe by read_ids
        9 Final valid reads
WARNING: Low number of reads found. This is likely to lead to errors when trying to generate plots
Loading plotting interface
    Found 9 total reads
    Found 9 pass reads (qual >= 7.0 and length >= 0)
Generating HTML report
    Parsing html config file
    Running method run_summary
        Computing plot
    Running method basecall_summary
        Computing plot
    Running method alignment_summary
/usr/local/lib/python3.7/site-packages/numpy/core/fromnumeric.py:3257: RuntimeWarning:

Mean of empty slice.

/usr/local/lib/python3.7/site-packages/numpy/core/_methods.py:161: RuntimeWarning:

invalid value encountered in double_scalars

        Computing plot
    Running method read_len_1D
        Computing plot
    Running method align_len_1D
        Computing plot
Traceback (most recent call last):
  File "/usr/local/bin/pycoQC", line 10, in <module>
    sys.exit(main_pycoQC())
  File "/usr/local/lib/python3.7/site-packages/pycoQC/__main__.py", line 132, in main_pycoQC
    quiet = args.quiet)
  File "/usr/local/lib/python3.7/site-packages/pycoQC/pycoQC.py", line 160, in pycoQC
    skip_coverage_plot=skip_coverage_plot)
  File "/usr/local/lib/python3.7/site-packages/pycoQC/pycoQC_report.py", line 89, in html_report
    fig = method(**method_args)
  File "/usr/local/lib/python3.7/site-packages/pycoQC/pycoQC_plot.py", line 489, in align_len_1D
    height=height)
  File "/usr/local/lib/python3.7/site-packages/pycoQC/pycoQC_plot.py", line 535, in __1D_density_plot
    lab1, dd1, ld1 = self.__1D_density_data ("all", field_name, x_scale, nbins, smooth_sigma)
  File "/usr/local/lib/python3.7/site-packages/pycoQC/pycoQC_plot.py", line 582, in __1D_density_data
    min = np.nanmin(data)
  File "<__array_function__ internals>", line 6, in nanmin
  File "/usr/local/lib/python3.7/site-packages/numpy/lib/nanfunctions.py", line 320, in nanmin
    res = np.fmin.reduce(a, axis=axis, out=out, **kwargs)
ValueError: zero-size array to reduction operation fmin which has no identity
.
MustafaElshani commented 1 year ago

Getting the very same error, anyone managed to fix it?

fo40225 commented 5 months ago

If sequencing_summary.txt doesn't contain any pass reads, it will cause this error.

SalvadorGJ commented 5 months ago

Hello,

I have the same issue, however it seems that there are pass reads. Here is my log:

Checking arguments values
Check input data files
Parse data files
Merge data
Cleaning data
    Discarding lines containing NA values
        0 reads discarded
    Filtering out zero length reads
        0 reads discarded
    Sorting run IDs by decreasing throughput
        Run-id order ['4254b708acc48a73459cb6ebfb4d97078ac316a7']
    Reordering runids
        Processing reads with Run_ID 4254b708acc48a73459cb6ebfb4d97078ac316a7 / time offset: 0
    Cleaning up low frequency barcodes
        0 reads with low frequency barcode unset
    Cast value to appropriate type
    Reindexing dataframe by read_ids
        3,329,137 Final valid reads
Loading plotting interface
    Found 3,329,137 total reads
    Found 3,327,073 pass reads (qual >= 10.0 and length >= 50)
Generating HTML report
    Parsing html config file
    Running method run_summary
        Computing plot
    Running method basecall_summary
        Computing plot
    Running method alignment_summary
/usr/local/lib/python3.8/dist-packages/numpy/core/fromnumeric.py:3464: RuntimeWarning:

Mean of empty slice.

/usr/local/lib/python3.8/dist-packages/numpy/core/_methods.py:192: RuntimeWarning:

invalid value encountered in scalar divide

        Computing plot
    Running method read_len_1D
        Computing plot
    Running method align_len_1D
        Computing plot
Traceback (most recent call last):
  File "/usr/local/bin/pycoQC", line 8, in <module>
    sys.exit(main_pycoQC())
  File "/usr/local/lib/python3.8/dist-packages/pycoQC/__main__.py", line 115, in main_pycoQC
    pycoQC (
  File "/usr/local/lib/python3.8/dist-packages/pycoQC/pycoQC.py", line 155, in pycoQC
    reporter.html_report(
  File "/usr/local/lib/python3.8/dist-packages/pycoQC/pycoQC_report.py", line 89, in html_report
    fig = method(**method_args)
  File "/usr/local/lib/python3.8/dist-packages/pycoQC/pycoQC_plot.py", line 480, in align_len_1D
    fig = self.__1D_density_plot (
  File "/usr/local/lib/python3.8/dist-packages/pycoQC/pycoQC_plot.py", line 535, in __1D_density_plot
    lab1, dd1, ld1 = self.__1D_density_data ("all", field_name, x_scale, nbins, smooth_sigma)
  File "/usr/local/lib/python3.8/dist-packages/pycoQC/pycoQC_plot.py", line 582, in __1D_density_data
    min = np.nanmin(data)
  File "<__array_function__ internals>", line 200, in nanmin
  File "/usr/local/lib/python3.8/dist-packages/numpy/lib/nanfunctions.py", line 343, in nanmin
    res = np.fmin.reduce(a, axis=axis, out=out, **kwargs)
ValueError: zero-size array to reduction operation fmin which has no identity 

Any suggestions?

SalvadorGJ commented 5 months ago

I found what was wrong, some of the reads have a different name between the basecaller summary and the bam file. The difference was generated after processing the reads using pychopper.