Illumina / interop

C++ Library to parse Illumina InterOp files
http://illumina.github.io/interop/index.html
GNU General Public License v3.0
75 stars 26 forks source link

pf std from interop #300

Closed katiemharding closed 2 years ago

katiemharding commented 2 years ago

I know that I can get % clusters passing filter mean with:

percent_pf = summary.at(read_index).at(lane_index).at(surface_index).percent_pf().mean()

but trying to get the standard deviation of the same doesn't work percent_pf_std = summary.at(read_index).at(lane_index).at(surface_index).percent_pf().std()

Base Space gives something like: Lane 1 Read 1 Cluster PF(%) 81.26 -/+ 1.64. I am looking for how to get the 1.64 part

ezralanglois commented 2 years ago

First, BaseSpace is showing you per lane per read. You are accessing the data with per Lane per Read per Surface.

And what does not work mean? Are you getting NaN?

katiemharding commented 2 years ago

percent_pf = summary.at(read_index).at(lane_index).at(surface_index).percent_pf().mean() returns 74.40805053710938

percent_pf_std = summary.at(read_index).at(lane_index).at(surface_index).percent_pf().std() returns:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [19], in <cell line: 1>()
----> 1 test.std()

File ~/.virtualenvs/reagent-qc-xvEGZSY3-py3.9/lib/python3.9/site-packages/interop/py_interop_summary.py:409, in metric_stat.<lambda>(self, name)
    407 __setattr__ = lambda self, name, value: _swig_setattr(self, metric_stat, name, value)
    408 __swig_getmethods__ = {}
--> 409 __getattr__ = lambda self, name: _swig_getattr(self, metric_stat, name)
    410 __repr__ = _swig_repr
    412 def __init__(self, *args):

File ~/.virtualenvs/reagent-qc-xvEGZSY3-py3.9/lib/python3.9/site-packages/interop/py_interop_summary.py:80, in _swig_getattr(self, class_type, name)
     78 if method:
     79     return method(self)
---> 80 raise AttributeError("'%s' object has no attribute '%s'" % (class_type.__name__, name))

AttributeError: 'metric_stat' object has no attribute 'std'

I would like it to return some sort of a value.

katiemharding commented 2 years ago

for read 1 percent_pf = summary.at(read_index).at(lane_index).at(surface_index).percent_pf().mean() returns 82.2605, 80.2609, 82.2605, 80.2609,82.2605, 80.2609,82.2605, 80.2609, with a mean of 81.2607 (which matches base space) I want the standard deviation, which base space returns as 1.65, and I can't calculate from what percent_pf = summary.at(read_index).at(lane_index).at(surface_index).percent_pf().mean() returns.

ezralanglois commented 2 years ago

This error is saying that std is not the name of the method you need:

AttributeError: 'metric_stat' object has no attribute 'std'

You should replace it with stddev

https://github.com/Illumina/interop/blob/c98d2689941cd557e6dad43884ff12b55b3e327b/interop/model/summary/metric_stat.h#L93

katiemharding commented 2 years ago

Thank you!
This is great, and a lot closer to what is displayed on the basespace page. BaseSpace shows 81.26 +/- 1.64 using .percent_pf().mean() I get 81.62. but .percent_pf().stddev() I get 1.29 (it actually returns two values, 1.169 and 1.419).

Can I use this to get the basespace %PF mean and Stdev?

ezralanglois commented 2 years ago

This is not what BaseSpace uses

percent_pf = summary.at(read_index).at(lane_index).at(surface_index).percent_pf().mean()
percent_pf_std = summary.at(read_index).at(lane_index).at(surface_index).percent_pf().stddev()

This is what BaseSpace uses, when it prints something like this: Lane 1 Read 1 Cluster PF(%) 81.26 -/+ 1.64.

percent_pf = summary.at(read_index).at(lane_index).percent_pf().mean()
percent_pf_std = summary.at(read_index).at(lane_index).percent_pf().stddev()

Depending on the flow cell type, these can give you the same or different answers because the first reports on the surface level and the second on the lane level.

katiemharding commented 2 years ago

awesome! That works, and the data matches. Thank you.