Knowledge-Graph-Hub / kg-covid-19

An instance of KG Hub to produce a knowledge graph for COVID-19 response.
https://github.com/Knowledge-Graph-Hub/kg-covid-19/wiki
BSD 3-Clause "New" or "Revised" License
79 stars 26 forks source link

NCBI GEO #13

Open pnrobinson opened 4 years ago

pnrobinson commented 4 years ago

There are lots of relevant datasets. Many of them appear to be good old fashioned microarrays. This can be processed semiautomatically, and so we could try to download "lots" of datasets. This will need some further investigation

https://www.ncbi.nlm.nih.gov/gds?term=coronavirus&cmd=correctspelling

realmarcin commented 4 years ago

I'm going to sign up for this one, assuming no other takers or higher priority data. Along the way I can also generate biclusters. I also have a biclustering collaborator at UPenn who is willing to donate time to help with dataset creation and applying methods. More on Monday!

pnrobinson commented 4 years ago

Hi Marcin, we need this data for another project as well. Please contact me about this. We did this about 10 years ago for 10K datasets in GEO and things went reasonably well with some Bioconductor scripts, although I woundn't be who I am if I could still find those scripts today....

callahantiff commented 4 years ago

@pnrobinson @realmarcin -

Since there will likely be several GEO datasets we download, I'm wondering if we want to try and create (or use @pnrobinson existing scripts) a primary script that handles the bulk of the analyses that will be the same for these resources. Then, add specific transformation scripts as needed for each source. Seems more reproducible and robust in the long run. Thoughts? If you agree, I'm happy to help set-up a quick call so we formulate a plan for moving forward.

Linking other relevant issues: #11, #12, #19

callahantiff commented 4 years ago

@pnrobinson @realmarcin -

Since there will likely be several GEO datasets we download, I'm wondering if we want to try and create (or use @pnrobinson existing scripts) a primary script that handles the bulk of the analyses that will be the same for these resources. Then, add specific transformation scripts as needed for each source. Seems more reproducible and robust in the long run. Thoughts? If you agree, I'm happy to help set-up a quick call so we formulate a plan for moving forward.

Linking other relevant issues: #11, #12, #19

@pnrobinson , @realmarcin - did you guys by chance talk about this today? Just curious on your thoughts.

justaddcoffee commented 4 years ago

@pnrobinson discussed this ticket a bit with @realmarcin today, and we had some questions. possibly could discuss on on the n2v call (Thursday 12 pm your time), if you are going to be around