Illumina / interop

C++ Library to parse Illumina InterOp files
http://illumina.github.io/interop/index.html
GNU General Public License v3.0
75 stars 26 forks source link

How to calculate cluster density (K/mm2) from the interop module #336

Closed kpatankar closed 7 months ago

kpatankar commented 7 months ago

I have following code to generate %Q30 and Cluster PF %

 run_metrics = py_interop_run_metrics.run_metrics()
    valid_to_load = py_interop_run.uchar_vector(py_interop_run.MetricCount, 0)
    py_interop_run_metrics.list_summary_metrics_to_load(valid_to_load)
    run_folder = run_metrics.read(inputDirectory, valid_to_load)
    summary = py_interop_summary.run_summary()
    py_interop_summary.summarize_run_metrics(run_metrics, summary)
    q_30_score = round(float(summary.total_summary().percent_gt_q30()),2)
    perc_clusters_pf = round(float((summary.total_summary().cluster_count_pf()/summary.total_summary().cluster_count()*100),2))

This code gives me correct values of 88.92 for %>Q30 and 93.56% for the Cluster PF % for MiSeq Demo Data https://github.com/Illumina/interop/releases/download/v1.0.6/MiSeqDemo.zip

However I can not generate the Cluster Density K/mm2 value which should be Density (K/mm2) 1362. How can I generate this value, I see thetile_metric()class has a cluster_density_k attribute https://illumina.github.io/interop/group__tile__metric.html

Can you please let me know how to use interop module to generate this metrics. Thanks !

ezralanglois commented 7 months ago

Density is only reported at the read/lane level.

It does not change per read, so it is sufficient to look only at the first read

read_idx = 0
lane_idx = 0
summary.at(read_idx).at(lane_idx).density()
kpatankar commented 7 months ago

Yes, I am using following commands to calculate Cluster Density (K/mm2)

density = round(float((summary.at(1).at(0)).density().mean()),2)    
cluster_density = round((density / 1000), 2)

From the definition: Density is measured as 1000 (K) clusters per square millimeter (mm²). Raw cluster density indicates how many clusters are on the flow cell, regardless of whether they passed filter.

What is the difference between summary.at(0).lane(0) and summary.at(1).lane(0) both give the same value of density. Would the same command work for NextSeq and Nova Seq instruments as well?

Thank you for the solution.

ezralanglois commented 7 months ago

There is no difference between the density, density PF, cluster count and cluster count PF between the reads.

kpatankar commented 7 months ago

Thank you !