Closed yelekley closed 4 years ago
Just checking, what sequencer is this for?
Also, InterOp support C++, Python and C#. What language do you want to get the metric using?
Novaseq, Miseq and Nextseq. I prefer Python, Thanks
Q30 is defined on the basecalls, not the clusters. A metric like PF would be on the clusters.
So, I assume you want the total number of bases calls (e.g. cluster_count*cycle_count)
Here is an example that will allow you to load the summary metrics (follow this up until In [8]:
):
https://github.com/Illumina/interop/blob/044fbc86ec32b3c079af37ad57f5ceae273d1c5c/docs/src/Tutorial_01_Intro.ipynb
for read_index in range(summary.size()):
for lane_index in range(summary.lane_count()):
fraction_gt_q30 = summary.at(read_index).at(lane_index).percent_gt_q30().mean()/100
yield_g = summary.at(read_index).at(lane_index).yield_g().mean()
bases_gt_q30 = fraction_gt_q30*yield_g*1e9
lane_number = summary.at(read_index).at(lane_index).lane()
bases_gt_q30
gives you the total number of called bases that are >= Q30.
lane_number
gives you the corresponding lane
Thank you so much for your help. Now I just need to get ChipResultsSummary's yield that is listed in Bustard. I found an example how to parse Tile Metric binary file and got the total clusterCountPF and clusterCountRaw for the run. I just need to figure out to get the yield metric and I can avoid parsing BustardSummary.xml file. Is it accessible by yield_g function? Thanks
Yes
On Sat, Aug 22, 2020, 11:07 AM yelekley notifications@github.com wrote:
Thank you so much for your help. Now I just need to get ChipResultsSummary's yield that is listed in Bustard. I found an example how to parse Tile Metric binary file and got the total clusterCountPF and clusterCountRaw for the run. I just need to figure out to get the yield metric and I can avoid parsing BustardSummary.xml file. Is it accessible by yield_g function? Thanks
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Illumina/interop/issues/226#issuecomment-678673023, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQB4LSW3IEKZDWO3T2TUZTSCACMLANCNFSM4QDWFVTQ .
Returning an error, no attributes 'mean'.
fraction_gt_q30 = summary_lane.at(read_index).at(lane_index).percent_gt_q30().mean()/100
AttributeError: 'float' object has no attribute 'mean'
thanks
Another question if you don't mind... When I calculate the total yield and the projected yield for the run. The numbers of the projected yield are very close to what's in BustardSummary under chip result summary. The sum of yield_g*1e9 is always less than what's in BustardSummary. So it's the projected yield that is in Bustard, and not the actual yield, do you know? Thanks, Here is the code:
tyield=0
pyield=0
for read_index in range(summary.size()):
for lane_index in range(summary.lane_count()):
yield_g = summary.at(read_index).at(lane_index).yield_g()
yieldtotal=yield_g*1e9
tyield += yieldtotal
yield_p = summary.at(read_index).at(lane_index).projected_yield_g()
projyield = yield_p*1e9
pyield += projyield
print(tyield)
print(pyield)
@yelekley %Q30 is stored as the aggregate across all tiles in the lane directly (in other words, we already calculate a weighted average across tiles when you call .percent_gt_q30()), so it should work to just remove the .mean() from your expression above.
As far as yield vs. projected yield, yield represents the estimated amount of non-N bases that have been processed so far, while projected yield represents the expected amount of non-N bases by the end of the run. If a run is successfully completed, both should converge to the same value. If the run is still in progress, yield will be less than projected yield.
Hello, Please suggest how to get >=q30 cluster count per lane. Thank you