AlexsLemonade / refinebio

Refine.bio harmonizes petabytes of publicly available biological data into ready-to-use datasets for cancer researchers and AI/ML scientists.
https://www.refine.bio/
Other
129 stars 20 forks source link

Missing Experiment GSE131617 #1691

Closed cgreene closed 4 years ago

cgreene commented 5 years ago

Context

A user requested this dataset.

Problem or idea

This one is interesting - it was uploaded on May 26, 2019. I thought that perhaps we hadn't surveyed microarray datasets since then, except I think that @davidsmejia did just re-survey. We should see if our code to re-survey missed this one. It's pretty big (~425 human samples). If it got missed, maybe some others did as well.

Solution or next step

Check to see if the re-survey code found this. The user who reported it does want updates, so when this gets closed we should let them know (see the experiment request form associated with this accession).

davidsmejia commented 5 years ago

Looks like it was not in the last batch of re-run microarray experiments accession codes. It appears as though the sample's platform is GPL5188 which is not on our list of supported platforms

cgreene commented 5 years ago

BrainArray does support that platform, and there's ~10k samples on it. We should be able to process it, but it may need the extra RAM that the HTA2.0 jobs are using (what @arielsvn was just adding).

cgreene commented 4 years ago

Interestingly this isn't yet available. It looks like we have ~18k samples from the platform processed, so perhaps these are still in the backlog: https://www.refine.bio/search?p=3&ordering=-num_downloadable_samples&filter_order=organism,platform,platform,platform,platform&organism=homo_sapiens&platform=GPL5175&platform=HuEx-1_0-st&platform=GPL23541&platform=GPL15236

cgreene commented 4 years ago

May be due to #1986

cgreene commented 4 years ago

I have emailed the requester letting them know that we had made all but two samples available. I cc'd @dvenprasad in case they elect to share more about how they are using the data.

I'm writing to follow-up on your request for the experiment GSE131617. We've now added support for this platform and have been able to process the experiment successfully. As a quick heads up, there are two remaining samples for the platform that failed to process with the same error: "GSM3791176_Expression_BN_V-VI_13-FC_090612.CEL has a disproportionate number of zero values, so it cannot be processed."

It looks like those two samples may have some QC issues or other problems t hat cause SCAN to fail to process them.

Thank you for reaching out to us! We'd love to hear about how you're using data from refine.bio.

Sincerely, Casey