EBI-Metagenomics / MGnifyR

R package for searching, downloading and analysis of EBI MGnify metagenomics data
https://ebi-metagenomics.github.io/MGnifyR/
Artistic License 2.0
19 stars 10 forks source link

mgnify_get_download_urls stops with the following error: Error in `colnames<-`(`*tmp*`, value = `*vtmp*`) : attempt to set 'colnames' on an object with less than two dimensions #1

Closed kafker closed 2 years ago

kafker commented 2 years ago

As the title say the mgnify_get_download_urls function stop with the following error:

dl_urls<-mgnify_get_download_urls(mg, $analysis_accession, accession_type = "analyses")
  |=================================================                                                               |  44%
Error in `colnames<-`(`*tmp*`, value = `*vtmp*`) : 
  attempt to set 'colnames' on an object with less than two dimensions

This error is triggered by some of the accessions I have in Marine_samples_metagenome, e.g. MGYA00278521 or MGYA00278684 which have only two URL. However, it does not trigger If I launch mgnify_get_download_urls on a single accession:

dl_urls_MGYA00278684<-mgnify_get_download_urls(mg, "MGYA00278684", accession_type = "analyses")

I have attached Marine_samples_metagenome to reproduce the error Marine_samples_metagenome.txt

beadyallen commented 2 years ago

Hi,

I can’t reproduce the error with those specific accessions, but in general this sort of thing happens when the backend MGnify database is not entirely consistent (which is reasonably often – samples not pointing to an analysis, multiple projects using the same sample, assemblies not having an associated project, etc etc). It’s not (I don’t think) a bug in MgnifyR, but rather something ~should~ be fixed elsewhere.

That said, clearly there needs to be a workaround to be able to use MgnifyR in the way it’s intended. What I tend to do is loop over the individual accessions, and wrap in a tryCatch. So in your case, you could try:

downloads <- sapply(mglist$V1, function(x){ tryCatch(mgnify_get_download_urls(mg, x, accession_type="analyses"), error=function(y){ cat(paste("Failed to retrieve",x)); NA} )} )

Your attachment seems to have exported a little funny (try using tab separation and quoting text), but I think you’ve got 4326 unique accessions you’re trying to retrieve. I’ve got the above code running on that list, and will update you once it’s finished. One important thing to look for in the metadata is that the “analyses” are “completed”. Entries can end up in the database (and therefore API results) when there’s no data to actually show because e.g. a run failed.

kafker commented 2 years ago

Ben, I really appreciated your help, and thank you for the chunk of code.

I did not know about the "completed" analysis in the metadata file. This is really helpful.

Best

beadyallen commented 2 years ago

No problem. By the way - it looks like MGYA00100742 might be the problem. All the rest of the accessions have "analysis_analysis-status" set to “completed”, whereas that one is “QC not passed”. It’s the only problematic one I can see. Am just running the download retrieval on an updated list (minus the bad one) to check it works with a standard "mgnify_get_download_urls".

There's a good argument from a ease-of-use point of view that MGnifyR should handle errors like this. It's not been implemented yet because really it's a core database issue, but once more users begin finding problems, we might have to rethink.

Edit: The download retrieval does complete successfully once MGYA00100742 is removed from the query accessions.