AlexsLemonade / refinebio

Refine.bio harmonizes petabytes of publicly available biological data into ready-to-use datasets for cancer researchers and AI/ML scientists.
https://www.refine.bio/
Other
129 stars 19 forks source link

Perform Principal Component Analysis on Agilent Two Color Dataset #211

Open Miserlou opened 6 years ago

Miserlou commented 6 years ago

Context

We have the data!

Now we need to know: is the data good? Specifically - Can we clearly separate channel one from channel two? Even better - can we automatically classify an experiment as being a reference or loop experiment?

Problem or idea

screen shot 2018-04-19 at 1 32 00 pm screen shot 2018-04-19 at 1 32 30 pm

Solution or next step

Rich relearns how to use Pandas, scikit-learn and Jupyter.

jaclyn-taroni commented 6 years ago

Related: #87 (where the original questions were posted)

Miserlou commented 6 years ago

The GSE that are processed in that file are:

GSE35576
GSE22900
GSE15109
GSE23669
GSE38242
GSE28748
GSE21367
GSE39477
GSE30181
GSE68081
GSE45403
GSE58295
GSE42401
GSE38241
GSE26129
GSE25346
GSE19324
GSE29917
GSE55668
GSE51081
[35576, 22900, 15109, 23669, 38242, 28748, 21367, 39477, 30181, 68081, 45403, 58295, 42401, 38241, 26129, 25346, 19324, 29917, 55668, 51081]
Miserlou commented 6 years ago

screen shot 2018-04-19 at 3 33 23 pm

I did a thing

Miserlou commented 6 years ago

gse21367_cor_matrix gse21367_cor_pca gse26129_cor_matrix gse26129_cor_pca gse28748_cor_matrix gse29917_cor_matrix gse29917_cor_pca gse30181_cor_matrix gse30181_cor_pca gse35576_cor_matrix gse35576_cor_pca gse38241_cor_matrix gse38241_cor_pca gse38242_cor_matrix gse38242_cor_pca gse39477_cor_matrix gse39477_cor_pca gse42401_cor_matrix gse42401_cor_pca gse51081_cor_matrix gse51081_cor_pca gse55668_cor_matrix gse58295_cor_matrix gse58295_cor_pca gse68081_cor_matrix gse68081_cor_pca

jaclyn-taroni commented 6 years ago

@Miserlou can we put the notebook you used for this over in https://github.com/AlexsLemonade/agilent-two-color (via PR)? I think that's a better spot for review and will be helpful if we 1) have multiple folks working on this (you & me most likely) and 2) if we take this further -- i.e., automatic detection of different designs.

Miserlou commented 6 years ago

Yeah for now it's been here: https://github.com/Miserlou/Science

jaclyn-taroni commented 6 years ago

Whatta name!