Illumina / interop

C++ Library to parse Illumina InterOp files
http://illumina.github.io/interop/index.html
GNU General Public License v3.0
75 stars 26 forks source link

Q: Plotting %Occ vs %PF #256

Closed sklages closed 3 years ago

sklages commented 3 years ago

I wanted to parse imaging_table output in Python to plot % Occupied vs % PF per lane to get a similar image like in SAV (for NovaSeq data).

Well, the plots do not look the same, so I assume you are not simply taking all values of both columns and plot these as scatter plot? Looks like in SAV there are less data points...

Alternatively, how can I accomplish this using the Python interop bindings? The docs are hard to read (for me as a Python beginner) ..

thank you.

ezralanglois commented 3 years ago

Could you provide a screen shot of the plot you are looking for? Is it the per lane box plot at the bottom center of the Analysis tab?

sklages commented 3 years ago

Checking for overclustered flowcells on NovaSeq: image

When simply plotting imaging_table output (x='% Occupied', y='% Pass Filter') I get a similar, but not identical image (both series sorted): image

This has not yet been separated by lane ...

ezralanglois commented 3 years ago

Ah, I see. Thanks for the screen shot, I definitely misunderstood what you were asking.

The InterOp library provides most of the plots in SAV, except the ones from the imaging table. Those are built using some legacy plotting library outside of InterOp.

We have not developed any python code to do what you are trying to do. So, getting them to match exactly is probably not worth the effort.

The biggest difference I see between the two plots is that the SAV code is filtering the data by the Lane column and plotting a series per Lane. Another change would be to reduce the size of the marker you are using.

I personally like plotly express. If you work from a pandas data frame, then you can do it in one line plotly.express.scatter(df, '% Occupied', '% Pass Filter', color='Lane')

sklages commented 3 years ago

Thanks for your advice!

Yes, I still need to separate per lane. I also need to reduce the number of datapoints, I guess ...

For custom plots outside standard SAV it is probably best to stick with imaging_table output ..

Plotly Express looks good .. I will have a look. Thanks for this :-)