Illumina / interop

C++ Library to parse Illumina InterOp files
http://illumina.github.io/interop/index.html
GNU General Public License v3.0
75 stars 26 forks source link

Q: Python: summary(run_metrics, 'Lane'/'Read') -> 'IsIndex' returns 78/89 #264

Closed sklages closed 3 years ago

sklages commented 3 years ago

Just a question: summary(run_metrics, 'Lane'/'Read') returns 78 (read) and 89 (index read) for IsIndex.

Is that correct? And if so, what do these numbers mean? I was expecting a bool or 0/1 or so .. (the struct format here is 'u1' but should read '<u1'?)

array([(1, 78, 0.06379907, 1486.5305, 1.7360141, 94.607285, 93.48533, 119.25953 , 119.25953 ),
       (2, 89,        nan, 1595.1562, 0.       , 90.52646 , 93.48533,  29.814854,  29.814854),
       (3, 78, 0.27108514, 1190.537 , 1.7067204, 93.23006 , 93.48533, 387.58676 , 387.58676 )],
      dtype=[('ReadNumber', '<u2'), ('IsIndex', 'u1'), ('Error Rate', '<f4'), <...>])
ezralanglois commented 3 years ago

We store those as char internally. I think we would need to convert it to a str for it to print properly in numpy.

78 = N No, not an index read 89 = Y Yes, an index read

ezralanglois commented 3 years ago

With pandas this is easy to fix

import pandas as pd
ar = summary(rm, 'Read')
df = pd.DataFrame(ar)
df['IsIndex'] = df['IsIndex].map(chr)

You can also compare in numpy using

ord('Y') and ord('N')
sklages commented 3 years ago

Ah, okay. Got it. Thanks.