GenomiqueENS / toulligQC

A post sequencing QC tool for Oxford Nanopore sequencers
Other
87 stars 8 forks source link

toulligqc for duplex #20

Closed jodjo86 closed 1 year ago

jodjo86 commented 1 year ago

hi I am trying to create a report for a duplex analysis (guppy_basecaller_duplex) and and I got this error message.

toulligqc --report-name QC_duplex \                                                                          
         --barcoding \
         --telemetry-source duplex/sequencing_telemetry.js \
         --sequencing-summary-source duplex/sequencing_summary.txt \
         --html-report-path duplex/QC_duplex.html \
         --barcodes barcode01
duplex/QC_duplex.html
ToulligQC version 2.2.1
* Initialize extractors
* Start Toulligqc info extractor
* End of Toulligqc info extractor (done in 0m0.00s)
* Start Sequencing telemetry extractor
* End of Sequencing telemetry extractor (done in 0m0.00s)
* Start Basecaller sequencing summary extractor
  - Load sequencing summary file (0.04 MB used) in 0m0.06s
Traceback (most recent call last):
  File "/home/minion/miniconda3/envs/nano/bin/toulligqc", line 10, in <module>
    sys.exit(main())
  File "/home/minion/miniconda3/envs/nano/lib/python3.10/site-packages/toulligqc/toulligqc.py", line 343, in main
    extractor.extract(result_dict)
  File "/home/minion/miniconda3/envs/nano/lib/python3.10/site-packages/toulligqc/sequencing_summary_extractor.py", line 234, in extract
    extract_barcode_info(self, result_dict,
  File "/home/minion/miniconda3/envs/nano/lib/python3.10/site-packages/toulligqc/sequencing_summary_common.py", line 144, in extract_barcode_info
    dataframe_dict["read.fail.barcoded"] = _barcode_frequency(extractor, barcode_selection, result_dict,
  File "/home/minion/miniconda3/envs/nano/lib/python3.10/site-packages/toulligqc/sequencing_summary_common.py", line 273, in _barcode_frequency
    set_result_value(extractor, result_dict, entry + '.count', sum(count_sorted.drop("unclassified")))
  File "/home/minion/miniconda3/envs/nano/lib/python3.10/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/home/minion/miniconda3/envs/nano/lib/python3.10/site-packages/pandas/core/series.py", line 4771, in drop
    return super().drop(
  File "/home/minion/miniconda3/envs/nano/lib/python3.10/site-packages/pandas/core/generic.py", line 4267, in drop
    obj = obj._drop_axis(labels, axis, level=level, errors=errors)
  File "/home/minion/miniconda3/envs/nano/lib/python3.10/site-packages/pandas/core/generic.py", line 4311, in _drop_axis
    new_axis = axis.drop(labels, errors=errors)
  File "/home/minion/miniconda3/envs/nano/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 6644, in drop
    raise KeyError(f"{list(labels[mask])} not found in axis")
KeyError: "['unclassified'] not found in axis"

I can send you the two guppy basecaller output files if necessary.

ps. I also have a request. Would it be possible to specify a barcode range for the --barcodes argument. I use a lot of barcode and the command quickly becomes very long. --barcodes barcode01,barcode02, ... barcode48

jodjo86 commented 1 year ago

I updated to 2.4 and I get a similar error.

toulligqc --report-name QC_duplex \                                             
         --barcoding \
         --telemetry-source duplex/sequencing_telemetry.js \
         --sequencing-summary-source duplex/sequencing_summary.txt \
         --html-report-path duplex/QC_duplex.html \
         --barcodes barcode01
duplex/QC_duplex.html
ToulligQC version 2.4
* Initialize extractors
* Start Toulligqc info extractor
* End of Toulligqc info extractor (done in 0m0.00s)
* Start Sequencing telemetry extractor
* End of Sequencing telemetry extractor (done in 0m0.00s)
* Start Basecaller sequencing summary extractor
  - Load sequencing summary file (0.03 MB used) in 0m0.05s
Traceback (most recent call last):
  File "/home/minion/miniconda3/envs/nano/bin/toulligqc", line 10, in <module>
    sys.exit(main())
  File "/home/minion/miniconda3/envs/nano/lib/python3.10/site-packages/toulligqc/toulligqc.py", line 348, in main
    extractor.extract(result_dict)
  File "/home/minion/miniconda3/envs/nano/lib/python3.10/site-packages/toulligqc/sequencing_summary_extractor.py", line 234, in extract
    extract_barcode_info(self, result_dict,
  File "/home/minion/miniconda3/envs/nano/lib/python3.10/site-packages/toulligqc/sequencing_summary_common.py", line 156, in extract_barcode_info
    dataframe_dict["base.fail.barcoded"] = _barcode_bases(extractor, barcode_selection, result_dict,
  File "/home/minion/miniconda3/envs/nano/lib/python3.10/site-packages/toulligqc/sequencing_summary_common.py", line 347, in _barcode_bases
    set_result_value(extractor, result_dict, entry + '.count', sum(count_sorted.drop("unclassified")))
  File "/home/minion/miniconda3/envs/nano/lib/python3.10/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/home/minion/miniconda3/envs/nano/lib/python3.10/site-packages/pandas/core/series.py", line 4771, in drop
    return super().drop(
  File "/home/minion/miniconda3/envs/nano/lib/python3.10/site-packages/pandas/core/generic.py", line 4267, in drop
    obj = obj._drop_axis(labels, axis, level=level, errors=errors)
  File "/home/minion/miniconda3/envs/nano/lib/python3.10/site-packages/pandas/core/generic.py", line 4311, in _drop_axis
    new_axis = axis.drop(labels, errors=errors)
  File "/home/minion/miniconda3/envs/nano/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 6644, in drop
    raise KeyError(f"{list(labels[mask])} not found in axis")
KeyError: "['unclassified'] not found in axis"
alihamraoui commented 1 year ago

Dear @jodjo86,

Thank you for using ToulligQC and for reporting this issue. ToulligQC does not yet support duplex analysis. We plan to address this in the upcoming months!

However, it seems that this issue is related to the barcode section. Could you please provide me with your guppy output files so that I can reproduce the bug?

Regarding your request, I will add the option to specify a barcode range in the next version. Thank you for the suggestion!

Best regards, Ali

jodjo86 commented 1 year ago

Hi @alihamraoui,

Thank you for the quick reply and for considering my request. Here are the guppy output files for my duplex test run with one barcode. guppy_log.zip

I hope this will be useful to you. thanks,

Joel

alihamraoui commented 1 year ago

Hi @jodjo86,

I think I've fixed this issue. Could you please clone it again and give it a trial with your real data?

I have also added the option to specify range for barcodes. you can use : --barcodes barcode01:barcode48

Hope this works!

best, Ali

jodjo86 commented 1 year ago

Hi @alihamraoui ,

Thanks for adding the feature to specify range for barcodes. It seems to work well. If there is a problem I will make a separate issue to simplify follow-up.

I still have an error message for the duplex report :(

thanks, Joel

toulligqc --report-name QC_duplex \                                                                   
         --barcoding \
         --telemetry-source duplex/sequencing_telemetry.js \
         --sequencing-summary-source duplex/sequencing_summary.txt \
         --html-report-path duplex/QC_duplex.html \
         --barcodes barcode01
duplex/QC_duplex.html
ToulligQC version 2.4
* Initialize extractors
* Start Toulligqc info extractor
* End of Toulligqc info extractor (done in 0m0.00s)
* Start Sequencing telemetry extractor
* End of Sequencing telemetry extractor (done in 0m0.00s)
* Start Basecaller sequencing summary extractor
  - Load sequencing summary file (0.03 MB used) in 0m0.01s
  - Extract info from sequencing summary file in 0m0.05s
  - Creation of image "Read count histogram" in 0m0.13s
  - Creation of image "Distribution of read lengths" in 0m0.10s
Traceback (most recent call last):
  File "/home/minion/miniconda3/envs/trim/bin/toulligqc", line 33, in <module>
    sys.exit(load_entry_point('toulligqc==2.4', 'console_scripts', 'toulligqc')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/minion/miniconda3/envs/trim/lib/python3.11/site-packages/toulligqc-2.4-py3.11.egg/toulligqc/toulligqc.py", line 388, in main
    graphs.extend(extractor.graph_generation(result_dict))
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/minion/miniconda3/envs/trim/lib/python3.11/site-packages/toulligqc-2.4-py3.11.egg/toulligqc/sequencing_summary_extractor.py", line 279, in graph_generation
    add_image_to_result(self.quiet, images, time.time(), pgg.yield_plot(self.dataframe_1d, self.images_directory))
                                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/minion/miniconda3/envs/trim/lib/python3.11/site-packages/toulligqc-2.4-py3.11.egg/toulligqc/plotly_graph_generator.py", line 204, in yield_plot
    count_x, count_y, cum_count_y = _smooth_data(npoints=npoints, sigma=sigma,
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/minion/miniconda3/envs/trim/lib/python3.11/site-packages/toulligqc-2.4-py3.11.egg/toulligqc/plotly_graph_common.py", line 205, in _smooth_data
    min_arg = np.nanmin(data)
              ^^^^^^^^^^^^^^^
  File "<__array_function__ internals>", line 180, in nanmin
  File "/home/minion/miniconda3/envs/trim/lib/python3.11/site-packages/numpy/lib/nanfunctions.py", line 350, in nanmin
    res = np.amin(a, axis=axis, out=out, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<__array_function__ internals>", line 180, in amin
  File "/home/minion/miniconda3/envs/trim/lib/python3.11/site-packages/numpy/core/fromnumeric.py", line 2918, in amin
    return _wrapreduction(a, np.minimum, 'min', axis, None, out,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/minion/miniconda3/envs/trim/lib/python3.11/site-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction
    return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: zero-size array to reduction operation minimum which has no identity
alihamraoui commented 1 year ago

Hi @jodjo86,

You're getting this error because in this small sequencing summary example, you have only passed filter reads (I should also fix this issue).

For now, you also need to provide the sequencing summary of the failed reads.

--sequencing-summary-source duplex/sequencing_summary_pass.txt \ --sequencing-summary-source duplex/sequencing_summary_fail.txt

I assume that this example is just a subset of your entire sequencing summary?

It will work if you provide the sequencing summary for your entire dataset.

Best regards, Ali

jodjo86 commented 1 year ago

Everything works well. THANKS

alihamraoui commented 1 year ago

Cool.

I'm glad that this can help. I will make sure that in the next versions, you will have the option to use only passed filter reads.

best, Ali