Illumina / interop

C++ Library to parse Illumina InterOp files
http://illumina.github.io/interop/index.html
GNU General Public License v3.0
75 stars 26 forks source link

Update examples with new wrapper #269

Closed matthdsm closed 3 years ago

matthdsm commented 3 years ago

Hi,

Would it be possible to update the example iPython notebooks with examples using the new wrapper?

Thanks Matthias

ezralanglois commented 3 years ago

The problem with the Notebooks is that we don't have a good testing infrastructure built around them. We switched to doc test documentation to ensure everything was correct, there is where it can be found: http://illumina.github.io/interop/namespacecore.html

I think we need to come up with a way to better render the page, but there an many examples there.

matthdsm commented 3 years ago

Thanks, I'm looking to recreate the SAV plots as mentioned in the tutorials, but I'm having trouble finding the correct data. Could you direct me to the right functions (using the new API) to get the following plots?

Thanks again. Matthias

matthdsm commented 3 years ago

Also,

Under core.create_valid_to_load it says: List of validate metric_names can be gotten using `list_interop_files` But the list_interop_files files method is nowhere to be found.

M

matthdsm commented 3 years ago

So I've been digging some more but I keep getting stuck. I've been trying the different methods to extract metrics and run info, but I seem to be unable to get any "sensible" output (e.g. a numpy array), instead everything seems to keep returning a swig type object.

This is probably mostly my fault, but I was wondering if you'd direct my to the function I need to get actual data. So far I've got

run_dir = "../data/NextSeq"
run_metrics = interop.read(run=run_dir, valid_to_load=interop.load_to_string_list(interop.load_imaging_metrics()))

q_metrics = run_metrics.q_metric_set()
extraction_metrics = run_metrics.extraction_metric_set()

When following Tutorial3, we can see how to extract data for one tile in one cylce, but to generate a plot, I'd need a dataframe with all data. Same goes for the q_metrics.

Which method do I need to extract the metrics I need?

Thanks M

ezralanglois commented 3 years ago

Thanks, I'm looking to recreate the SAV plots as mentioned in the tutorials, but I'm having trouble finding the correct data. Could you direct me to the right functions (using the new API) to get the following plots?

  • qscore histogram
  • qscore heatmap
  • intensity/channel/cycle

Thanks again. Matthias

The new API does not cover these functions. That is still a work in progress. We only have what is listed in the documentation, which includes the following tables: imaging, summary, indexing.

ezralanglois commented 3 years ago

Also,

Under core.create_valid_to_load it says: List of validate metric_names can be gotten using `list_interop_files` But the list_interop_files files method is nowhere to be found.

M

It looks like this function was missed when we released this. It is in an experimental branch.

This line shows an example: https://github.com/Illumina/interop/blob/c98d2689941cd557e6dad43884ff12b55b3e327b/src/ext/python/core.py#L944

The function can be created like so:

def list_interop_files():
    valid_to_load = interop_run.uchar_vector(interop_run.MetricCount, 1)
    return load_to_string_list(valid_to_load)

Note that this does not list the files on disk, just potential files that may be available.

ezralanglois commented 3 years ago

So I've been digging some more but I keep getting stuck. I've been trying the different methods to extract metrics and run info, but I seem to be unable to get any "sensible" output (e.g. a numpy array), instead everything seems to keep returning a swig type object.

This is probably mostly my fault, but I was wondering if you'd direct my to the function I need to get actual data. So far I've got

run_dir = "../data/NextSeq"
run_metrics = interop.read(run=run_dir, valid_to_load=interop.load_to_string_list(interop.load_imaging_metrics()))

q_metrics = run_metrics.q_metric_set()
extraction_metrics = run_metrics.extraction_metric_set()

When following Tutorial3, we can see how to extract data for one tile in one cylce, but to generate a plot, I'd need a dataframe with all data. Same goes for the q_metrics.

Which method do I need to extract the metrics I need?

Thanks M

The new API does not cover this information yet. It is focused on the tables. I think most of the metrics that you want can be obtained using the imaging table, which lists all metrics for all tiles and cycles. You can convert this numpy array to a pandas DataFrame pretty easily.

matthdsm commented 3 years ago

Hi,

Thanks for the info! Is there any kind of timeline for the extra API functions?

As for the imaging table, I already us this one to plot %occ vs %pf which works great, but the table doesn't seems to include intensities or qscores (as far as I can see). Are those metrics under other keys?

Thanks again M

ezralanglois commented 3 years ago

No time line at the moment. We found the most value internally to be exposing the tables.

Are you looking for % >= Q30 and P90?

matthdsm commented 3 years ago

Well I'll be... That IS what I'm looking for!

Thanks a lot! M

matthdsm commented 3 years ago

Actually no,

To plot the qscore heatmap, I'd need the Qscore value/cyle, not just the % >= 30 same goes for the histogram.

Is there a way to extract that data? Or just plain the plot data thats used in the plotting executables?

Thanks M

matthdsm commented 3 years ago

alternatively, could you add an example on how to use the py_interop_plot.plot_qscore_heatmap wrapper?

Thanks M

ezralanglois commented 3 years ago

Ah, I see. Ya, for the 3-bin q-table you could get everything from the imaging table. You would have to derive the q10 bin from what is left over from q20 and q30.

But for more bins, this is no longer an option from the imaging table.

This is how it could be done with the old interface.


        run = interop.core.read('run/folder')
        options = py_interop_plot.filter_options(run.run_info().flowcell().naming_method())
        // set options
        rows = py_interop_plot.count_rows_for_heatmap(run)
        cols = py_interop_plot.count_columns_for_heatmap(run)
        dataBuffer = numpy.zeros((rows, cols), dtype=numpy.float32)
        data = py_interop_plot.heatmap_data()
        try:
            py_interop_plot.plot_qscore_heatmap(run, options, data, dataBuffer.ravel())
        except py_interop_plot.invalid_filter_option: pass
matthdsm commented 3 years ago

Thanks!

How do I access the heatmap data? I'm not entirely clear on that.

Thanks again. M

ezralanglois commented 3 years ago

You could plot it with matplotlib like so

axes.cla()
axes.imshow(buffer.T, cmap=matplotlib.cm.cool)            
axes.set_aspect('auto')
matthdsm commented 3 years ago

Thanks,

Sorry for being a bit high maintenance, but I'm trying to get the plots into MultiQC. Is there a way to get the values so I can parse them in the correct format? https://multiqc.info/docs/#heatmaps

M

matthdsm commented 3 years ago

Fixed (albeit in a hacky way) https://github.com/CenterForMedicalGeneticsGhent/MultiQC_SAV/blob/main/multiqc_sav/modules/sav.py#L526

Thanks for everything! M

Nitin123-4 commented 4 months ago

MultiQC_SAV gives issues for NovaSeqXplus

Please copy this log and report it at https://github.com/MultiQC/MultiQC/issues │ │ Please attach a file that triggers the error. The last file found was: RunParameters.xml.zip 20240412_LH00557_0015_A227M7HLT4/RunParameters.xml │ │ │ │ Traceback (most recent call last): │ │ File "/MG/SHARED/APPS/ANACONDA_DIR/anaconda/envs/multiqc_sav/lib/python3.11/site-packages/multiqc/multiqc.py", line 751, in run │ │ output = mod() │ │ ^^^^^ │ │ File "/MG/SHARED/APPS/ANACONDA_DIR/anaconda/envs/multiqc_sav/lib/python3.11/site-packages/multiqc_sav/modules/sav.py", line 273, in init │ │ self.imaging_qc() │ │ File "/MG/SHARED/APPS/ANACONDA_DIR/anaconda/envs/multiqc_sav/lib/python3.11/site-packages/multiqc_sav/modules/sav.py", line 635, in imaging_qc │ │ plot_data = self.parse_imaging_table(imaging) │ │ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ │ │ File "/MG/SHARED/APPS/ANACONDA_DIR/anaconda/envs/multiqc_sav/lib/python3.11/site-packages/multiqc_sav/modules/sav.py", line 706, in parse_imaging_table │ │ occ_pf[f"Lane {lane}"].append({"x": occ, "y": pf, "color": colors[lane]}) │ │ ~~^^^^^^ │ │ IndexError: list index out of range