Closed matthdsm closed 3 years ago
The problem with the Notebooks is that we don't have a good testing infrastructure built around them. We switched to doc test documentation to ensure everything was correct, there is where it can be found: http://illumina.github.io/interop/namespacecore.html
I think we need to come up with a way to better render the page, but there an many examples there.
Thanks, I'm looking to recreate the SAV plots as mentioned in the tutorials, but I'm having trouble finding the correct data. Could you direct me to the right functions (using the new API) to get the following plots?
Thanks again. Matthias
Also,
Under core.create_valid_to_load
it says:
List of validate metric_names can be gotten using `list_interop_files`
But the list_interop_files
files method is nowhere to be found.
M
So I've been digging some more but I keep getting stuck.
I've been trying the different methods to extract metrics and run info, but I seem to be unable to get any "sensible" output (e.g. a numpy array), instead everything seems to keep returning a swig
type object.
This is probably mostly my fault, but I was wondering if you'd direct my to the function I need to get actual data. So far I've got
run_dir = "../data/NextSeq"
run_metrics = interop.read(run=run_dir, valid_to_load=interop.load_to_string_list(interop.load_imaging_metrics()))
q_metrics = run_metrics.q_metric_set()
extraction_metrics = run_metrics.extraction_metric_set()
When following Tutorial3, we can see how to extract data for one tile in one cylce, but to generate a plot, I'd need a dataframe with all data.
Same goes for the q_metrics
.
Which method do I need to extract the metrics I need?
Thanks M
Thanks, I'm looking to recreate the SAV plots as mentioned in the tutorials, but I'm having trouble finding the correct data. Could you direct me to the right functions (using the new API) to get the following plots?
- qscore histogram
- qscore heatmap
- intensity/channel/cycle
Thanks again. Matthias
The new API does not cover these functions. That is still a work in progress. We only have what is listed in the documentation, which includes the following tables: imaging, summary, indexing.
Also,
Under
core.create_valid_to_load
it says:List of validate metric_names can be gotten using `list_interop_files`
But thelist_interop_files
files method is nowhere to be found.M
It looks like this function was missed when we released this. It is in an experimental branch.
This line shows an example: https://github.com/Illumina/interop/blob/c98d2689941cd557e6dad43884ff12b55b3e327b/src/ext/python/core.py#L944
The function can be created like so:
def list_interop_files():
valid_to_load = interop_run.uchar_vector(interop_run.MetricCount, 1)
return load_to_string_list(valid_to_load)
Note that this does not list the files on disk, just potential files that may be available.
So I've been digging some more but I keep getting stuck. I've been trying the different methods to extract metrics and run info, but I seem to be unable to get any "sensible" output (e.g. a numpy array), instead everything seems to keep returning a
swig
type object.This is probably mostly my fault, but I was wondering if you'd direct my to the function I need to get actual data. So far I've got
run_dir = "../data/NextSeq" run_metrics = interop.read(run=run_dir, valid_to_load=interop.load_to_string_list(interop.load_imaging_metrics())) q_metrics = run_metrics.q_metric_set() extraction_metrics = run_metrics.extraction_metric_set()
When following Tutorial3, we can see how to extract data for one tile in one cylce, but to generate a plot, I'd need a dataframe with all data. Same goes for the
q_metrics
.Which method do I need to extract the metrics I need?
Thanks M
The new API does not cover this information yet. It is focused on the tables. I think most of the metrics that you want can be obtained using the imaging
table, which lists all metrics for all tiles and cycles. You can convert this numpy array to a pandas DataFrame pretty easily.
Hi,
Thanks for the info! Is there any kind of timeline for the extra API functions?
As for the imaging table
, I already us this one to plot %occ vs %pf
which works great, but the table doesn't seems to include intensities or qscores (as far as I can see). Are those metrics under other keys?
Thanks again M
No time line at the moment. We found the most value internally to be exposing the tables.
Are you looking for % >= Q30 and P90?
Well I'll be... That IS what I'm looking for!
Thanks a lot! M
Actually no,
To plot the qscore heatmap, I'd need the Qscore value/cyle, not just the % >= 30 same goes for the histogram.
Is there a way to extract that data? Or just plain the plot data thats used in the plotting executables?
Thanks M
alternatively, could you add an example on how to use the py_interop_plot.plot_qscore_heatmap
wrapper?
Thanks M
Ah, I see. Ya, for the 3-bin q-table you could get everything from the imaging table. You would have to derive the q10 bin from what is left over from q20 and q30.
But for more bins, this is no longer an option from the imaging table.
This is how it could be done with the old interface.
run = interop.core.read('run/folder')
options = py_interop_plot.filter_options(run.run_info().flowcell().naming_method())
// set options
rows = py_interop_plot.count_rows_for_heatmap(run)
cols = py_interop_plot.count_columns_for_heatmap(run)
dataBuffer = numpy.zeros((rows, cols), dtype=numpy.float32)
data = py_interop_plot.heatmap_data()
try:
py_interop_plot.plot_qscore_heatmap(run, options, data, dataBuffer.ravel())
except py_interop_plot.invalid_filter_option: pass
Thanks!
How do I access the heatmap data? I'm not entirely clear on that.
Thanks again. M
You could plot it with matplotlib like so
axes.cla()
axes.imshow(buffer.T, cmap=matplotlib.cm.cool)
axes.set_aspect('auto')
Thanks,
Sorry for being a bit high maintenance, but I'm trying to get the plots into MultiQC. Is there a way to get the values so I can parse them in the correct format? https://multiqc.info/docs/#heatmaps
M
Fixed (albeit in a hacky way) https://github.com/CenterForMedicalGeneticsGhent/MultiQC_SAV/blob/main/multiqc_sav/modules/sav.py#L526
Thanks for everything! M
MultiQC_SAV gives issues for NovaSeqXplus
Please copy this log and report it at https://github.com/MultiQC/MultiQC/issues │
│ Please attach a file that triggers the error. The last file found was:
RunParameters.xml.zip
20240412_LH00557_0015_A227M7HLT4/RunParameters.xml │
│ │
│ Traceback (most recent call last): │
│ File "/MG/SHARED/APPS/ANACONDA_DIR/anaconda/envs/multiqc_sav/lib/python3.11/site-packages/multiqc/multiqc.py", line 751, in run │
│ output = mod() │
│ ^^^^^ │
│ File "/MG/SHARED/APPS/ANACONDA_DIR/anaconda/envs/multiqc_sav/lib/python3.11/site-packages/multiqc_sav/modules/sav.py", line 273, in init │
│ self.imaging_qc() │
│ File "/MG/SHARED/APPS/ANACONDA_DIR/anaconda/envs/multiqc_sav/lib/python3.11/site-packages/multiqc_sav/modules/sav.py", line 635, in imaging_qc │
│ plot_data = self.parse_imaging_table(imaging) │
│ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ │
│ File "/MG/SHARED/APPS/ANACONDA_DIR/anaconda/envs/multiqc_sav/lib/python3.11/site-packages/multiqc_sav/modules/sav.py", line 706, in parse_imaging_table │
│ occ_pf[f"Lane {lane}"].append({"x": occ, "y": pf, "color": colors[lane]}) │
│ ~~^^^^^^ │
│ IndexError: list index out of range
Hi,
Would it be possible to update the example iPython notebooks with examples using the new wrapper?
Thanks Matthias