AlexsLemonade / refinebio

Refine.bio harmonizes petabytes of publicly available biological data into ready-to-use datasets for cancer researchers and AI/ML scientists.
https://www.refine.bio/
Other
129 stars 19 forks source link

Detect Agilent one color vs. two color experiments #89

Open jaclyn-taroni opened 6 years ago

jaclyn-taroni commented 6 years ago

One color and two color experiments will be the same "platform" in GEO or ArrayExpress -- there should be some metadata field that indicates whether a second channel is present

jaclyn-taroni commented 6 years ago

I think looking at the protocol metadata is likely to be a good way to proceed. I suggest we might randomly select tens of experiments from the GEO platform GPL4133 & take a look at the metadata.

Miserlou commented 6 years ago

By looking at the protocols we can see the presence of both 'Cy3' and 'Cy5' strings: https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-4636/protocols/

Although it is not present in the API metadata: https://www.ebi.ac.uk/arrayexpress/json/v3/experiments/E-MTAB-4636/

Miserlou commented 6 years ago

A better example (previous one wasn't Agilent): https://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-77820

Still holds for Cy3/5 criteria.

Miserlou commented 6 years ago

Protocol has an API endpoint so we don't need to scrape: https://www.ebi.ac.uk/arrayexpress/json/v3/experiments/E-MTAB-4636/protocols

Miserlou commented 6 years ago

Verdict is that we should snarf basically the entire /protocols API response because it might be scientifically useful. So, related: #96

kurtwheeler commented 6 years ago

To clarify, if we detect that a sample is Agilent 1 Color then we should just grab the user-submitted processed data, if it is 2 Color then we should grab the raw data and process it with SCAN.UPC.

Miserlou commented 6 years ago

Samples (14078) Series (756)

Here are all of the GSEs for GPL4133:

!Platform_series_id = GSE7701
!Platform_series_id = GSE7702
!Platform_series_id = GSE7900
!Platform_series_id = GSE7902
!Platform_series_id = GSE8353
!Platform_series_id = GSE8993
!Platform_series_id = GSE9067
!Platform_series_id = GSE9077
!Platform_series_id = GSE9187
!Platform_series_id = GSE9561
!Platform_series_id = GSE9869
!Platform_series_id = GSE10057
!Platform_series_id = GSE10107
!Platform_series_id = GSE10164
!Platform_series_id = GSE10455
!Platform_series_id = GSE10541
!Platform_series_id = GSE10570
!Platform_series_id = GSE10613
!Platform_series_id = GSE10667
!Platform_series_id = GSE10863
!Platform_series_id = GSE10864
!Platform_series_id = GSE10868
!Platform_series_id = GSE10956
!Platform_series_id = GSE10959
!Platform_series_id = GSE11132
!Platform_series_id = GSE11173
!Platform_series_id = GSE11205
!Platform_series_id = GSE11233
!Platform_series_id = GSE11242
!Platform_series_id = GSE11682
!Platform_series_id = GSE11946
!Platform_series_id = GSE11968
!Platform_series_id = GSE11985
!Platform_series_id = GSE12075
!Platform_series_id = GSE12114
!Platform_series_id = GSE12307
!Platform_series_id = GSE12384
!Platform_series_id = GSE12385
!Platform_series_id = GSE12405
!Platform_series_id = GSE12553
!Platform_series_id = GSE12928
!Platform_series_id = GSE13216
!Platform_series_id = GSE13286
!Platform_series_id = GSE13334
!Platform_series_id = GSE13365
!Platform_series_id = GSE13407
!Platform_series_id = GSE13470
!Platform_series_id = GSE13566
!Platform_series_id = GSE13834
!Platform_series_id = GSE13886
!Platform_series_id = GSE13919
!Platform_series_id = GSE14028
!Platform_series_id = GSE14048
!Platform_series_id = GSE14097
!Platform_series_id = GSE14261
!Platform_series_id = GSE14312
!Platform_series_id = GSE14409
!Platform_series_id = GSE14476
!Platform_series_id = GSE14490
!Platform_series_id = GSE14560
!Platform_series_id = GSE14617
!Platform_series_id = GSE14681
!Platform_series_id = GSE14839
!Platform_series_id = GSE14853
!Platform_series_id = GSE14910
!Platform_series_id = GSE14972
!Platform_series_id = GSE14982
!Platform_series_id = GSE15075
!Platform_series_id = GSE15076
!Platform_series_id = GSE15109
!Platform_series_id = GSE15112
!Platform_series_id = GSE15212
!Platform_series_id = GSE15359
!Platform_series_id = GSE15549
!Platform_series_id = GSE15576
!Platform_series_id = GSE15812
!Platform_series_id = GSE15948
!Platform_series_id = GSE16026
!Platform_series_id = GSE16053
!Platform_series_id = GSE16065
!Platform_series_id = GSE16113
!Platform_series_id = GSE16123
!Platform_series_id = GSE16358
!Platform_series_id = GSE16532
!Platform_series_id = GSE16641
!Platform_series_id = GSE16727
!Platform_series_id = GSE16872
!Platform_series_id = GSE16945
!Platform_series_id = GSE16957
!Platform_series_id = GSE17018
!Platform_series_id = GSE17311
!Platform_series_id = GSE17403
!Platform_series_id = GSE17594
!Platform_series_id = GSE17623
!Platform_series_id = GSE17630
!Platform_series_id = GSE17632
!Platform_series_id = GSE17753
!Platform_series_id = GSE17766
!Platform_series_id = GSE17839
!Platform_series_id = GSE17842
!Platform_series_id = GSE17843
!Platform_series_id = GSE17860
!Platform_series_id = GSE17924
!Platform_series_id = GSE17992
!Platform_series_id = GSE18102
!Platform_series_id = GSE18109
!Platform_series_id = GSE18138
!Platform_series_id = GSE18316
!Platform_series_id = GSE18390
!Platform_series_id = GSE18438
!Platform_series_id = GSE18439
!Platform_series_id = GSE18457
!Platform_series_id = GSE18612
!Platform_series_id = GSE18689
!Platform_series_id = GSE18693
!Platform_series_id = GSE18817
!Platform_series_id = GSE18844
!Platform_series_id = GSE18849
!Platform_series_id = GSE18874
!Platform_series_id = GSE18875
!Platform_series_id = GSE18966
!Platform_series_id = GSE18971
!Platform_series_id = GSE19324
!Platform_series_id = GSE19362
!Platform_series_id = GSE19494
!Platform_series_id = GSE19541
!Platform_series_id = GSE19712
!Platform_series_id = GSE19716
!Platform_series_id = GSE19717
!Platform_series_id = GSE19718
!Platform_series_id = GSE19853
!Platform_series_id = GSE19939
!Platform_series_id = GSE19992
!Platform_series_id = GSE20028
!Platform_series_id = GSE20127
!Platform_series_id = GSE20147
!Platform_series_id = GSE20171
!Platform_series_id = GSE20298
!Platform_series_id = GSE20506
!Platform_series_id = GSE20680
!Platform_series_id = GSE20681
!Platform_series_id = GSE20686
!Platform_series_id = GSE20690
!Platform_series_id = GSE20750
!Platform_series_id = GSE20842
!Platform_series_id = GSE20906
!Platform_series_id = GSE20936
!Platform_series_id = GSE20937
!Platform_series_id = GSE20941
!Platform_series_id = GSE20945
!Platform_series_id = GSE20988
!Platform_series_id = GSE20993
!Platform_series_id = GSE21201
!Platform_series_id = GSE21202
!Platform_series_id = GSE21209
!Platform_series_id = GSE21280
!Platform_series_id = GSE21284
!Platform_series_id = GSE21328
!Platform_series_id = GSE21367
!Platform_series_id = GSE21501
!Platform_series_id = GSE21565
!Platform_series_id = GSE21586
!Platform_series_id = GSE21792
!Platform_series_id = GSE21886
!Platform_series_id = GSE21959
!Platform_series_id = GSE22030
!Platform_series_id = GSE22032
!Platform_series_id = GSE22085
!Platform_series_id = GSE22226
!Platform_series_id = GSE22265
!Platform_series_id = GSE22323
!Platform_series_id = GSE22384
!Platform_series_id = GSE22430
!Platform_series_id = GSE22586
!Platform_series_id = GSE22775
!Platform_series_id = GSE22778
!Platform_series_id = GSE22866
!Platform_series_id = GSE22891
!Platform_series_id = GSE22900
!Platform_series_id = GSE22901
!Platform_series_id = GSE23019
!Platform_series_id = GSE23074
!Platform_series_id = GSE23113
!Platform_series_id = GSE23131
!Platform_series_id = GSE23169
!Platform_series_id = GSE23171
!Platform_series_id = GSE23209
!Platform_series_id = GSE23363
!Platform_series_id = GSE23536
!Platform_series_id = GSE23669
!Platform_series_id = GSE23688
!Platform_series_id = GSE23689
!Platform_series_id = GSE23773
!Platform_series_id = GSE23803
!Platform_series_id = GSE23804
!Platform_series_id = GSE23807
!Platform_series_id = GSE23901
!Platform_series_id = GSE23903
!Platform_series_id = GSE23922
!Platform_series_id = GSE23989
!Platform_series_id = GSE24020
!Platform_series_id = GSE24100
!Platform_series_id = GSE24171
!Platform_series_id = GSE24231
!Platform_series_id = GSE24240
!Platform_series_id = GSE24268
!Platform_series_id = GSE24370
!Platform_series_id = GSE24432
!Platform_series_id = GSE24731
!Platform_series_id = GSE24732
!Platform_series_id = GSE24782
!Platform_series_id = GSE24876
!Platform_series_id = GSE24883
!Platform_series_id = GSE24908
!Platform_series_id = GSE24951
!Platform_series_id = GSE25167
!Platform_series_id = GSE25193
!Platform_series_id = GSE25200
!Platform_series_id = GSE25289
!Platform_series_id = GSE25346
!Platform_series_id = GSE25453
!Platform_series_id = GSE25623
!Platform_series_id = GSE25624
!Platform_series_id = GSE25844
!Platform_series_id = GSE25935
!Platform_series_id = GSE25936
!Platform_series_id = GSE26088
!Platform_series_id = GSE26089
!Platform_series_id = GSE26106
!Platform_series_id = GSE26129
!Platform_series_id = GSE26259
!Platform_series_id = GSE26321
!Platform_series_id = GSE26322
!Platform_series_id = GSE26411
!Platform_series_id = GSE26692
!Platform_series_id = GSE26721
!Platform_series_id = GSE26812
!Platform_series_id = GSE26855
!Platform_series_id = GSE26856
!Platform_series_id = GSE26857
!Platform_series_id = GSE26979
!Platform_series_id = GSE26993
!Platform_series_id = GSE26996
!Platform_series_id = GSE27173
!Platform_series_id = GSE27183
!Platform_series_id = GSE27254
!Platform_series_id = GSE27335
!Platform_series_id = GSE27503
!Platform_series_id = GSE27616
!Platform_series_id = GSE27619
!Platform_series_id = GSE27842
!Platform_series_id = GSE27900
!Platform_series_id = GSE27915
!Platform_series_id = GSE28000
!Platform_series_id = GSE28038
!Platform_series_id = GSE28045
!Platform_series_id = GSE28073
!Platform_series_id = GSE28230
!Platform_series_id = GSE28253
!Platform_series_id = GSE28300
!Platform_series_id = GSE28400
!Platform_series_id = GSE28401
!Platform_series_id = GSE28456
!Platform_series_id = GSE28478
!Platform_series_id = GSE28501
!Platform_series_id = GSE28522
!Platform_series_id = GSE28615
!Platform_series_id = GSE28623
!Platform_series_id = GSE28628
!Platform_series_id = GSE28650
!Platform_series_id = GSE28658
!Platform_series_id = GSE28748
!Platform_series_id = GSE28813
!Platform_series_id = GSE28818
!Platform_series_id = GSE28877
!Platform_series_id = GSE28907
!Platform_series_id = GSE28912
!Platform_series_id = GSE29000
!Platform_series_id = GSE29090
!Platform_series_id = GSE29141
!Platform_series_id = GSE29270
!Platform_series_id = GSE29288
!Platform_series_id = GSE29405
!Platform_series_id = GSE29507
!Platform_series_id = GSE29606
!Platform_series_id = GSE29608
!Platform_series_id = GSE29746
!Platform_series_id = GSE29760
!Platform_series_id = GSE29801
!Platform_series_id = GSE29861
!Platform_series_id = GSE29869
!Platform_series_id = GSE29886
!Platform_series_id = GSE29917
!Platform_series_id = GSE30023
!Platform_series_id = GSE30105
!Platform_series_id = GSE30107
!Platform_series_id = GSE30114
!Platform_series_id = GSE30131
!Platform_series_id = GSE30132
!Platform_series_id = GSE30171
!Platform_series_id = GSE30181
!Platform_series_id = GSE30432
!Platform_series_id = GSE30475
!Platform_series_id = GSE30592
!Platform_series_id = GSE30664
!Platform_series_id = GSE30904
!Platform_series_id = GSE30961
!Platform_series_id = GSE30994
!Platform_series_id = GSE31003
!Platform_series_id = GSE31093
!Platform_series_id = GSE31095
!Platform_series_id = GSE31147
!Platform_series_id = GSE31195
!Platform_series_id = GSE31277
!Platform_series_id = GSE31286
!Platform_series_id = GSE31322
!Platform_series_id = GSE31360
!Platform_series_id = GSE31425
!Platform_series_id = GSE31426
!Platform_series_id = GSE31427
!Platform_series_id = GSE31589
!Platform_series_id = GSE31728
!Platform_series_id = GSE31802
!Platform_series_id = GSE31904
!Platform_series_id = GSE31965
!Platform_series_id = GSE31981
!Platform_series_id = GSE32026
!Platform_series_id = GSE32143
!Platform_series_id = GSE32144
!Platform_series_id = GSE32150
!Platform_series_id = GSE32220
!Platform_series_id = GSE32221
!Platform_series_id = GSE32371
!Platform_series_id = GSE32388
!Platform_series_id = GSE32413
!Platform_series_id = GSE32441
!Platform_series_id = GSE32456
!Platform_series_id = GSE32645
!Platform_series_id = GSE32709
!Platform_series_id = GSE32915
!Platform_series_id = GSE33012
!Platform_series_id = GSE33093
!Platform_series_id = GSE33142
!Platform_series_id = GSE33224
!Platform_series_id = GSE33264
!Platform_series_id = GSE33267
!Platform_series_id = GSE33271
!Platform_series_id = GSE33272
!Platform_series_id = GSE33273
!Platform_series_id = GSE33277
!Platform_series_id = GSE33290
!Platform_series_id = GSE33526
!Platform_series_id = GSE33615
!Platform_series_id = GSE33673
!Platform_series_id = GSE33723
!Platform_series_id = GSE33731
!Platform_series_id = GSE33755
!Platform_series_id = GSE33812
!Platform_series_id = GSE33824
!Platform_series_id = GSE33910
!Platform_series_id = GSE34007
!Platform_series_id = GSE34077
!Platform_series_id = GSE34131
!Platform_series_id = GSE34153
!Platform_series_id = GSE34228
!Platform_series_id = GSE34252
!Platform_series_id = GSE34291
!Platform_series_id = GSE34303
!Platform_series_id = GSE34396
!Platform_series_id = GSE34429
!Platform_series_id = GSE34487
!Platform_series_id = GSE34499
!Platform_series_id = GSE34527
!Platform_series_id = GSE34792
!Platform_series_id = GSE34881
!Platform_series_id = GSE34940
!Platform_series_id = GSE35002
!Platform_series_id = GSE35133
!Platform_series_id = GSE35141
!Platform_series_id = GSE35142
!Platform_series_id = GSE35163
!Platform_series_id = GSE35168
!Platform_series_id = GSE35311
!Platform_series_id = GSE35313
!Platform_series_id = GSE35454
!Platform_series_id = GSE35477
!Platform_series_id = GSE35494
!Platform_series_id = GSE35500
!Platform_series_id = GSE35576
!Platform_series_id = GSE35733
!Platform_series_id = GSE35749
!Platform_series_id = GSE35753
!Platform_series_id = GSE35756
!Platform_series_id = GSE35757
!Platform_series_id = GSE35800
!Platform_series_id = GSE35814
!Platform_series_id = GSE35982
!Platform_series_id = GSE35994
!Platform_series_id = GSE36082
!Platform_series_id = GSE36207
!Platform_series_id = GSE36267
!Platform_series_id = GSE36549
!Platform_series_id = GSE36654
!Platform_series_id = GSE36758
!Platform_series_id = GSE36854
!Platform_series_id = GSE36931
!Platform_series_id = GSE37087
!Platform_series_id = GSE37110
!Platform_series_id = GSE37116
!Platform_series_id = GSE37117
!Platform_series_id = GSE37170
!Platform_series_id = GSE37257
!Platform_series_id = GSE37277
!Platform_series_id = GSE37326
!Platform_series_id = GSE37575
!Platform_series_id = GSE37738
!Platform_series_id = GSE37888
!Platform_series_id = GSE37957
!Platform_series_id = GSE38227
!Platform_series_id = GSE38241
!Platform_series_id = GSE38242
!Platform_series_id = GSE38330
!Platform_series_id = GSE38544
!Platform_series_id = GSE38581
!Platform_series_id = GSE38959
!Platform_series_id = GSE38974
!Platform_series_id = GSE39199
!Platform_series_id = GSE39200
!Platform_series_id = GSE39202
!Platform_series_id = GSE39400
!Platform_series_id = GSE39477
!Platform_series_id = GSE39493
!Platform_series_id = GSE39745
!Platform_series_id = GSE39764
!Platform_series_id = GSE39768
!Platform_series_id = GSE39847
!Platform_series_id = GSE40047
!Platform_series_id = GSE40185
!Platform_series_id = GSE40206
!Platform_series_id = GSE40315
!Platform_series_id = GSE40383
!Platform_series_id = GSE40384
!Platform_series_id = GSE40385
!Platform_series_id = GSE40386
!Platform_series_id = GSE40682
!Platform_series_id = GSE40808
!Platform_series_id = GSE41034
!Platform_series_id = GSE41110
!Platform_series_id = GSE41255
!Platform_series_id = GSE41436
!Platform_series_id = GSE41483
!Platform_series_id = GSE41502
!Platform_series_id = GSE41617
!Platform_series_id = GSE41651
!Platform_series_id = GSE41653
!Platform_series_id = GSE41744
!Platform_series_id = GSE41752
!Platform_series_id = GSE41781
!Platform_series_id = GSE42099
!Platform_series_id = GSE42256
!Platform_series_id = GSE42357
!Platform_series_id = GSE42401
!Platform_series_id = GSE42402
!Platform_series_id = GSE42520
!Platform_series_id = GSE42619
!Platform_series_id = GSE42643
!Platform_series_id = GSE42667
!Platform_series_id = GSE42668
!Platform_series_id = GSE42879
!Platform_series_id = GSE43049
!Platform_series_id = GSE43219
!Platform_series_id = GSE43467
!Platform_series_id = GSE43611
!Platform_series_id = GSE43674
!Platform_series_id = GSE43962
!Platform_series_id = GSE43973
!Platform_series_id = GSE44066
!Platform_series_id = GSE44133
!Platform_series_id = GSE44135
!Platform_series_id = GSE44290
!Platform_series_id = GSE44426
!Platform_series_id = GSE44729
!Platform_series_id = GSE44941
!Platform_series_id = GSE44987
!Platform_series_id = GSE45158
!Platform_series_id = GSE45245
!Platform_series_id = GSE45251
!Platform_series_id = GSE45340
!Platform_series_id = GSE45357
!Platform_series_id = GSE45371
!Platform_series_id = GSE45403
!Platform_series_id = GSE45404
!Platform_series_id = GSE45422
!Platform_series_id = GSE45531
!Platform_series_id = GSE45596
!Platform_series_id = GSE45763
!Platform_series_id = GSE45960
!Platform_series_id = GSE46021
!Platform_series_id = GSE46314
!Platform_series_id = GSE46408
!Platform_series_id = GSE46471
!Platform_series_id = GSE46477
!Platform_series_id = GSE46670
!Platform_series_id = GSE46973
!Platform_series_id = GSE46974
!Platform_series_id = GSE47147
!Platform_series_id = GSE47435
!Platform_series_id = GSE47511
!Platform_series_id = GSE47513
!Platform_series_id = GSE47830
!Platform_series_id = GSE48080
!Platform_series_id = GSE48132
!Platform_series_id = GSE48133
!Platform_series_id = GSE48211
!Platform_series_id = GSE48265
!Platform_series_id = GSE48384
!Platform_series_id = GSE48399
!Platform_series_id = GSE48838
!Platform_series_id = GSE48847
!Platform_series_id = GSE49175
!Platform_series_id = GSE49288
!Platform_series_id = GSE49578
!Platform_series_id = GSE49594
!Platform_series_id = GSE49657
!Platform_series_id = GSE49900
!Platform_series_id = GSE49969
!Platform_series_id = GSE49974
!Platform_series_id = GSE50395
!Platform_series_id = GSE50494
!Platform_series_id = GSE50619
!Platform_series_id = GSE50784
!Platform_series_id = GSE50911
!Platform_series_id = GSE50939
!Platform_series_id = GSE50988
!Platform_series_id = GSE51029
!Platform_series_id = GSE51059
!Platform_series_id = GSE51060
!Platform_series_id = GSE51081
!Platform_series_id = GSE51086
!Platform_series_id = GSE51087
!Platform_series_id = GSE51433
!Platform_series_id = GSE51561
!Platform_series_id = GSE51617
!Platform_series_id = GSE51622
!Platform_series_id = GSE51624
!Platform_series_id = GSE51748
!Platform_series_id = GSE51999
!Platform_series_id = GSE52061
!Platform_series_id = GSE52100
!Platform_series_id = GSE52211
!Platform_series_id = GSE52212
!Platform_series_id = GSE52292
!Platform_series_id = GSE52602
!Platform_series_id = GSE53014
!Platform_series_id = GSE53104
!Platform_series_id = GSE53175
!Platform_series_id = GSE53180
!Platform_series_id = GSE53181
!Platform_series_id = GSE53236
!Platform_series_id = GSE53270
!Platform_series_id = GSE53791
!Platform_series_id = GSE53792
!Platform_series_id = GSE53872
!Platform_series_id = GSE54033
!Platform_series_id = GSE54083
!Platform_series_id = GSE54171
!Platform_series_id = GSE54258
!Platform_series_id = GSE54635
!Platform_series_id = GSE54712
!Platform_series_id = GSE54872
!Platform_series_id = GSE54898
!Platform_series_id = GSE54981
!Platform_series_id = GSE55015
!Platform_series_id = GSE55024
!Platform_series_id = GSE55063
!Platform_series_id = GSE55064
!Platform_series_id = GSE55065
!Platform_series_id = GSE55288
!Platform_series_id = GSE55563
!Platform_series_id = GSE55565
!Platform_series_id = GSE55668
!Platform_series_id = GSE55669
!Platform_series_id = GSE55723
!Platform_series_id = GSE55787
!Platform_series_id = GSE56103
!Platform_series_id = GSE56116
!Platform_series_id = GSE56363
!Platform_series_id = GSE56519
!Platform_series_id = GSE56573
!Platform_series_id = GSE56618
!Platform_series_id = GSE56946
!Platform_series_id = GSE57259
!Platform_series_id = GSE57273
!Platform_series_id = GSE57341
!Platform_series_id = GSE57343
!Platform_series_id = GSE57473
!Platform_series_id = GSE57474
!Platform_series_id = GSE57571
!Platform_series_id = GSE57756
!Platform_series_id = GSE57825
!Platform_series_id = GSE58118
!Platform_series_id = GSE58295
!Platform_series_id = GSE58324
!Platform_series_id = GSE58397
!Platform_series_id = GSE58473
!Platform_series_id = GSE58542
!Platform_series_id = GSE58574
!Platform_series_id = GSE58791
!Platform_series_id = GSE58903
!Platform_series_id = GSE58940
!Platform_series_id = GSE58975
!Platform_series_id = GSE59140
!Platform_series_id = GSE59414
!Platform_series_id = GSE59660
!Platform_series_id = GSE59697
!Platform_series_id = GSE59938
!Platform_series_id = GSE60079
!Platform_series_id = GSE60128
!Platform_series_id = GSE60525
!Platform_series_id = GSE60919
!Platform_series_id = GSE60956
!Platform_series_id = GSE61124
!Platform_series_id = GSE61196
!Platform_series_id = GSE61805
!Platform_series_id = GSE61956
!Platform_series_id = GSE62105
!Platform_series_id = GSE62117
!Platform_series_id = GSE62191
!Platform_series_id = GSE62192
!Platform_series_id = GSE62224
!Platform_series_id = GSE62524
!Platform_series_id = GSE62747
!Platform_series_id = GSE62849
!Platform_series_id = GSE63029
!Platform_series_id = GSE63289
!Platform_series_id = GSE63524
!Platform_series_id = GSE63667
!Platform_series_id = GSE63859
!Platform_series_id = GSE64012
!Platform_series_id = GSE64014
!Platform_series_id = GSE64161
!Platform_series_id = GSE64163
!Platform_series_id = GSE64224
!Platform_series_id = GSE64237
!Platform_series_id = GSE64424
!Platform_series_id = GSE64586
!Platform_series_id = GSE64657
!Platform_series_id = GSE65034
!Platform_series_id = GSE65286
!Platform_series_id = GSE65954
!Platform_series_id = GSE66314
!Platform_series_id = GSE66434
!Platform_series_id = GSE66626
!Platform_series_id = GSE66649
!Platform_series_id = GSE66770
!Platform_series_id = GSE66886
!Platform_series_id = GSE66887
!Platform_series_id = GSE66888
!Platform_series_id = GSE67536
!Platform_series_id = GSE67636
!Platform_series_id = GSE67638
!Platform_series_id = GSE67887
!Platform_series_id = GSE67899
!Platform_series_id = GSE68081
!Platform_series_id = GSE68089
!Platform_series_id = GSE68215
!Platform_series_id = GSE68497
!Platform_series_id = GSE68531
!Platform_series_id = GSE68532
!Platform_series_id = GSE68809
!Platform_series_id = GSE68852
!Platform_series_id = GSE69534
!Platform_series_id = GSE69712
!Platform_series_id = GSE69980
!Platform_series_id = GSE70403
!Platform_series_id = GSE70905
!Platform_series_id = GSE70951
!Platform_series_id = GSE71769
!Platform_series_id = GSE72035
!Platform_series_id = GSE72585
!Platform_series_id = GSE72916
!Platform_series_id = GSE73089
!Platform_series_id = GSE73521
!Platform_series_id = GSE73556
!Platform_series_id = GSE73577
!Platform_series_id = GSE73953
!Platform_series_id = GSE74634
!Platform_series_id = GSE74635
!Platform_series_id = GSE74711
!Platform_series_id = GSE74752
!Platform_series_id = GSE74786
!Platform_series_id = GSE74895
!Platform_series_id = GSE75650
!Platform_series_id = GSE75678
!Platform_series_id = GSE75685
!Platform_series_id = GSE75766
!Platform_series_id = GSE76392
!Platform_series_id = GSE76809
!Platform_series_id = GSE77752
!Platform_series_id = GSE78250
!Platform_series_id = GSE78714
!Platform_series_id = GSE79292
!Platform_series_id = GSE79330
!Platform_series_id = GSE79478
!Platform_series_id = GSE79482
!Platform_series_id = GSE79579
!Platform_series_id = GSE79627
!Platform_series_id = GSE79629
!Platform_series_id = GSE79689
!Platform_series_id = GSE81058
!Platform_series_id = GSE81371
!Platform_series_id = GSE81589
!Platform_series_id = GSE81665
!Platform_series_id = GSE82233
!Platform_series_id = GSE82278
!Platform_series_id = GSE83519
!Platform_series_id = GSE83878
!Platform_series_id = GSE83879
!Platform_series_id = GSE83880
!Platform_series_id = GSE83881
!Platform_series_id = GSE83883
!Platform_series_id = GSE85698
!Platform_series_id = GSE85907
!Platform_series_id = GSE86062
!Platform_series_id = GSE86099
!Platform_series_id = GSE86115
!Platform_series_id = GSE86265
!Platform_series_id = GSE86266
!Platform_series_id = GSE87000
!Platform_series_id = GSE87674
!Platform_series_id = GSE87778
!Platform_series_id = GSE87910
!Platform_series_id = GSE89287
!Platform_series_id = GSE89422
!Platform_series_id = GSE89915
!Platform_series_id = GSE90132
!Platform_series_id = GSE90605
!Platform_series_id = GSE92915
!Platform_series_id = GSE93899
!Platform_series_id = GSE93900
!Platform_series_id = GSE94610
!Platform_series_id = GSE95000
!Platform_series_id = GSE95084
!Platform_series_id = GSE96671
!Platform_series_id = GSE98021
!Platform_series_id = GSE98737
!Platform_series_id = GSE100533
!Platform_series_id = GSE102265
!Platform_series_id = GSE102267
!Platform_series_id = GSE102641
!Platform_series_id = GSE103236
!Platform_series_id = GSE106206
!Platform_series_id = GSE107200
!Platform_series_id = GSE109009
!Platform_series_id = GSE109848
!Platform_series_id = GSE110905
kurtwheeler commented 6 years ago

Ok so it looks like the next thing to be done for this is to go through that list of accessions, pull the protocol for each experiment from the above listed protocols endpoint, and determine if any are Agilent 1-color. I believe the heuristic for determining 1-color vs 2-color is that if "Cy5" appears anywhere in the protocol information then it is a 2-color experiment.

@Miserlou or @jaclyn-taroni can you confirm if all of the above is accurate?

jaclyn-taroni commented 6 years ago

Yep, there should also be some fields that reference "Channel 2" (often ending in _ch2 in GEO) that are non-empty in 2-color experiments.

dongbohu commented 6 years ago

Here is the statistics of the 756 experiments:

285 IDs have both Cy3 and Cy5:

[7701, 7702, 9067, 9187, 10057, 10107, 10864, 10956, 10959, 11132, 11242, 11968, 12307, 13470, 13566, 14409, 14681, 14910, 14982, 15109, 15112, 16026, 16113, 16641, 16872, 16945, 17403, 17623, 17630, 17632, 17839, 17842, 17843, 17860, 17992, 18316, 18390, 18457, 18693, 19324, 19494, 19939, 19992, 20147, 20171, 20937, 20941, 20988, 20993, 21209, 21367, 21501, 21586, 22226, 22384, 22430, 22586, 22778, 22900, 22901, 23019, 23113, 23171, 23209, 23536, 23669, 23688, 23689, 23922, 24171, 24432, 24732, 24782, 24908, 24951, 25167, 25193, 25289, 25346, 25453, 26088, 26089, 26129, 26322, 26721, 26856, 26979, 27183, 27254, 27842, 27900, 28045, 28073, 28253, 28501, 28522, 28650, 28658, 28748, 28907, 29270, 29606, 29608, 29861, 29886, 29917, 30114, 30131, 30132, 30171, 30181, 30475, 30904, 30961, 31003, 31195, 31322, 31589, 31965, 32143, 32144, 32150, 32371, 32413, 33012, 33224, 33277, 33290, 33526, 33673, 33723, 33731, 33812, 33910, 34077, 34153, 34429, 34499, 34792, 34881, 34940, 35002, 35142, 35163, 35168, 35313, 35494, 35576, 35733, 35749, 35753, 35757, 35800, 35814, 35982, 35994, 36207, 36549, 37110, 37116, 37117, 37277, 37326, 37888, 37957, 38227, 38241, 38242, 39200, 39202, 39477, 39764, 40047, 40682, 41034, 41110, 41255, 41651, 41653, 41744, 41781, 42256, 42357, 42401, 42402, 42667, 42668, 42879, 43049, 43467, 43674, 43962, 43973, 44290, 44729, 45245, 45340, 45403, 45531, 45596, 46471, 46477, 46973, 46974, 47147, 47435, 48132, 48133, 49175, 49288, 49594, 49900, 50619, 50939, 50988, 51081, 51086, 51087, 51433, 51748, 52061, 52100, 52292, 52602, 53014, 53175, 53236, 53270, 54258, 54635, 54872, 54898, 55063, 55064, 55065, 55668, 55669, 55787, 56116, 56519, 56618, 57273, 57571, 57756, 57825, 58295, 58397, 58473, 58574, 58791, 59414, 59938, 60525, 60919, 61956, 62105, 62224, 63029, 63289, 63524, 63667, 63859, 64224, 64237, 65954, 66314, 66434, 66770, 68081, 68089, 68215, 68531, 69712, 70403, 71769, 73089, 74634, 74635, 74786, 75678, 75685, 75766, 76809, 79478, 79629]

189 IDs have Cy3 only:

[7900, 8993, 13407, 15075, 15076, 15212, 17753, 17924, 20298, 20680, 20681, 20686, 20750, 20842, 21201, 21202, 21328, 21565, 21792, 21886, 21959, 22030, 22032, 22323, 22866, 22891, 23074, 23131, 23773, 23803, 23804, 23807, 23901, 23903, 24100, 24231, 24240, 24268, 24731, 24876, 25200, 25623, 25624, 25844, 25935, 25936, 26259, 26321, 26855, 26857, 27173, 27335, 27503, 27915, 28038, 28230, 28456, 28478, 28615, 28623, 28628, 28813, 28818, 28877, 28912, 29090, 29141, 29288, 29405, 29507, 29746, 29760, 29869, 30105, 30107, 30432, 30592, 30664, 30994, 31093, 31147, 31286, 31360, 31904, 31981, 32221, 32441, 32645, 32709, 32915, 33142, 33264, 33267, 33271, 33272, 33273, 33615, 33755, 33824, 34007, 34131, 34252, 34487, 34527, 35133, 35141, 35454, 35500, 35756, 36082, 36267, 36654, 36758, 36854, 36931, 37087, 37170, 37257, 38330, 38544, 38581, 38959, 39199, 39493, 39745, 39768, 40383, 40384, 40385, 40386, 40808, 41436, 42099, 42643, 43219, 43611, 44066, 44426, 44987, 45158, 45251, 45422, 46021, 46314, 46408, 46670, 47830, 48080, 48384, 48399, 48838, 49578, 49657, 50395, 51561, 51999, 53104, 53181, 53791, 53792, 53872, 54712, 55015, 55024, 55563, 55565, 56103, 56363, 56946, 57473, 57474, 58118, 58940, 58975, 59140, 59697, 60128, 60956, 61805, 62117, 62747, 64012, 64014, 67887, 68497, 69534, 78714, 79627, 83883]

5 IDs have Cy5 only:

[9077, 19362, 49969, 49974, 50784]

87 IDs have neither Cy3 nor Cy5 (but still include protocol information):

[9869, 10455, 10541, 10613, 11682, 12384, 12385, 13334, 13365, 13919, 14097, 14312, 14490, 15812, 15948, 16957, 17311, 18109, 18438, 18439, 20906, 20945, 21280, 21284, 22265, 22775, 23169, 23363, 24020, 24370, 26411, 26692, 26812, 26993, 26996, 28000, 28300, 28400, 28401, 29000, 30023, 31095, 31425, 31426, 31427, 31802, 32026, 32388, 33093, 34228, 34291, 34396, 35311, 35477, 37738, 38974, 39400, 40185, 41483, 41502, 41617, 41752, 42520, 42619, 45357, 45371, 45960, 47511, 47513, 51059, 51060, 54033, 54981, 55288, 55723, 56573, 57341, 57343, 58542, 62191, 65034, 70905, 70951, 73521, 73556, 74711, 74752]

190 IDs are skipped: (no any protocol information)

[7902, 8353, 9561, 10164, 10570, 10667, 10863, 10868, 11173, 11205, 11233, 11946, 11985, 12075, 12114, 12405, 12553, 12928, 13216, 13286, 13834, 13886, 14028, 14048, 14261, 14476, 14560, 14617, 14839, 14853, 14972, 15359, 15549, 15576, 16053, 16065, 16123, 16358, 16532, 16727, 17018, 17594, 17766, 18102, 18138, 18612, 18689, 18817, 18844, 18849, 18874, 18875, 18966, 18971, 19541, 19712, 19716, 19717, 19718, 19853, 20028, 20127, 20506, 20690, 20936, 22085, 23989, 24883, 26106, 27616, 27619, 29801, 31277, 31728, 32220, 32456, 34303, 37575, 39847, 40206, 40315, 44133, 44135, 44941, 45404, 45763, 48211, 48265, 48847, 50494, 50911, 51029, 51617, 51622, 51624, 52211, 52212, 53180, 54083, 54171, 57259, 58324, 58903, 59660, 60079, 61124, 61196, 62192, 62524, 62849, 64161, 64163, 64424, 64586, 64657, 65286, 66626, 66649, 66886, 66887, 66888, 67536, 67636, 67638, 67899, 68532, 68809, 68852, 69980, 72035, 72585, 72916, 73577, 73953, 74895, 75650, 76392, 77752, 78250, 79292, 79330, 79482, 79579, 79689, 81058, 81371, 81589, 81665, 82233, 82278, 83519, 83878, 83879, 83880, 83881, 85698, 85907, 86062, 86099, 86115, 86265, 86266, 87000, 87674, 87778, 87910, 89287, 89422, 89915, 90132, 90605, 92915, 93899, 93900, 94610, 95000, 95084, 96671, 98021, 98737, 100533, 102265, 102267, 102641, 103236, 106206, 107200, 109009, 109848, 110905]

No any experiment's protocol information includes _ch2 substring.

Miserlou commented 6 years ago

Awesome, thanks Dongbo! Cy5 only.. 🤔

jaclyn-taroni commented 6 years ago

This was through ArrayExpress, correct @dongbohu ? If so, I wouldn't have expected the _ch2 to be in the protocol info.

dongbohu commented 6 years ago

There are some gray areas though. For example, 9077 is in Cy5 only category, but it has such a substring: ... and cyanine 3-labeled CTP https://www.ebi.ac.uk/arrayexpress/json/v3/experiments/E-GEOD-9077/protocols Should we count it as both Cy3 and Cy5 instead?

Another example: 74711 is in neither Cy3 nor Cy5 category, but it includes this string: 10.0 mM Cyanine 3- or 5-labeled CTP https://www.ebi.ac.uk/arrayexpress/json/v3/experiments/E-GEOD-74711/protocols Does this mean it should be in both Cy3 and Cy5 category?

dongbohu commented 6 years ago

@jaclyn-taroni Yes, all URLs are in this format: https://www.ebi.ac.uk/arrayexpress/json/v3/experiments/E-GEOD-xxxx/protocols

jaclyn-taroni commented 6 years ago

Okay, I will go through the two "gray area" examples you posted @dongbohu and a handful of the neither Cy3 nor Cy5 examples (possibly the no protocol examples...) and post some thoughts about how I detect one-color vs. two-color as a human.

jaclyn-taroni commented 6 years ago

Unknown, have protocol

Here, I've chosen to go through protocols. It is quite possible that examining the raw data files (if available) would tell us the answer more definitively. (Although based on the number of different formats accommodated by limma::read.maimages, I think this may be challenging.)

I am also assuming that it would be preferrable to glean what processor to use from the protocol info prior to raw data download, but if that is not the case we can take a different approach.

When in doubt, we can check the sdrf.txt files (this actually may turn out to be the most robust way to go...)

E-GEOD-28300

Verdict: One-color Why: One-Color RNA Spike-In RNA and One-Color Low RNA Input Linear Amplification Kit PLUS in P-GSE28300-5; One-Color Microarray-Based Gene Expression Analysis Protocol in P-GSE28300-6

E-GEOD-21280

Verdict: Two-color Why: Two-Color Microarray-Based Gene Expression Analysis in P-GSE21280-3; log2(Experiment/Control) in P-GSE21280-1; reference to loess normalization in multiple protocols is a hint

E-GEOD-9869

Verdict: Two-color Why: log(PSA treated cells/control cells), CH1_SIG_MEAN, CH2_SIG_MEAN in P-GSE9869-1; similar info in another protocol

E-GEOD-13334

Verdict: One-color Why: One-Color Microarray-Based Gene Expression Analysis in multiple protocols

E-GEOD-34228

Verdict: One-color Why: Cyanine 3-CTP & no mention of Cyanine 5 in P-GSE34228-5

E-GEOD-58542

Verdict: Two-color Why: Agilent Two-Color Microarray-Based Gene Expression Analysis in multiple protocols; also Normalized log2 ratios (test/reference) and loess normalization in P-GSE58542-1 are strong hints

E-GEOD-23169

Verdict: Two-color Why: normalized log10 ratio (treated/untreated) in P-GSE23169-1; also for each unique GSMXXXXX in Source Name in the sample-data table (sdrf.txt )there is a 1 and a 2

E-GEOD-47511

Verdict: One-color Why: Can't really tell for sure from protocols, had to look at sample-data table

E-GEOD-28401

This is a SuperSeries, GSE28400 is the relevant subseries.

Verdict: Two-color Why: log2-transformed ratio (HEK-293 miR-204/HEK-293 control) in P-GSE28400-1, sample accessions ending in 1 and 2 in the sample-data table

E-GEOD-12385

Verdict: One-color Why: Agilent One-Color Microarray-Based Gene Expression Analysis in P-GSE12385-8; Agilent One-Color RNA Spike-In RNA in P-GSE12385-7

My recommendations for further exploration: check for One-Color and Two-Color in the protocols, look for the same accession with 1 and 2 in the sample-data relationship info

jaclyn-taroni commented 6 years ago

My comment above is based on ArrayExpress. @kurtwheeler, @Miserlou, @dongbohu is the plan to only retrieve GEO data from GEO when it is unavailable on ArrayExpress?

Taking a one-color and two-color example from above and looking at GEO --

jaclyn-taroni commented 6 years ago

Another example: 74711 is in neither Cy3 nor Cy5 category, but it includes this string: 10.0 mM Cyanine 3- or 5-labeled CTP https://www.ebi.ac.uk/arrayexpress/json/v3/experiments/E-GEOD-74711/protocols Does this mean it should be in both Cy3 and Cy5 category?

Yes.

There are some gray areas though. For example, 9077 is in Cy5 only category, but it has such a substring: ... and cyanine 3-labeled CTP https://www.ebi.ac.uk/arrayexpress/json/v3/experiments/E-GEOD-9077/protocols Should we count it as both Cy3 and Cy5 instead?

E-GEOD-9077 is one of the stranger experiments I've come across! There are three platforms and it's not a SuperSeries -- I would probably ignore it.

kurtwheeler commented 6 years ago

is the plan to only retrieve GEO data from GEO when it is unavailable on ArrayExpress

I think we should get it from wherever the metadata is better. I think I've been assuming that GEO data would have better metadata on GEO than ArrayExpress since it's not secondhand, but @Miserlou has worked more with the GEO metadata so I bet he knows for sure.

jaclyn-taroni commented 6 years ago

We'll need to be able to detect one- and two-color experiments from both ArrayExpress and GEO

jaclyn-taroni commented 6 years ago

Cy5 only

jaclyn-taroni commented 6 years ago

I randomly took a look at 5 experiments with no protocol information. They are likely not replicated in ArrayExpress for whatever reason.