Open cgreene opened 6 years ago
I've dug into this a bit more and merged the issues together as they arise during the processing of a single accession: https://sentry.io/greenelab/staging-refinebio/issues/667302064/?query=is:unresolved%20gene_convert_illumina
On #491, I randomly selected Illumina experiments that I knew were on supported platforms and that included GSE98897
. GSE98897
happens to be a SuperSeries that includes GSE98895
which is on a supported platform and the way I constructed my list would not account for that (see also: https://github.com/jaclyn-taroni/beadarray-platform-detection#results). The other samples under the SuperSeries umbrella were run on miRNA arrays which we do not support.
Given that there is likely a bug in the python code as noted above, it's hard to say exactly what happened here. I would expect that we would not attempt to convert the gene ids in these samples because they are not on a supported platform.
I forgot to mention that there are 15 of these events and I would expect this to occur for all 40 samples on the miRNA arrays.
It's possible that some got rate limited. It looks like ~25% were dropped due to rate limiting. Though 15 isn't 25% of 40, if this experiment was handled at a time where a higher proportion was dropped that could explain the lower number of reports.
Digging a bit further, the miRNA array probe identifiers in GSM2627173-tbl-1.txt, one of the failures, do not appear to overlap with any of the whole genome chip identifiers (Human v1-4) which I would expect to get caught at the "detect database" step
Pretty sure the line with the error message should be using result.returncode and result.stderr. Not e. Assigned myself but if anyone else wants to tackle it before I get to things today, feel free. Should be a quick fix.
https://sentry.io/greenelab/staging-refinebio/issues/667326691/