bokulich-lab / q2-fondue

Functions for reproducibly Obtaining and Normalizing Data re-Used from Elsewhere
BSD 3-Clause "New" or "Revised" License
20 stars 6 forks source link

Efetch module refactor #26

Closed misialq closed 3 years ago

misialq commented 3 years ago

Disclaimer: this is a nasty but necessary PR.

This PR introduces a more-or-less complete refactor of the _efetch.py submodule. Rather than extract metadata from the NCBI response in an unstructured fashion, we can make use of the hierarchical structure of NCBI's SRA objects: study/sample/experiment/run. This will not only make the code easier to read but also will simplify handling of all the different metadata levels. It also prepares the fondue for the next PR on fetching by project ID (#13).

Here's a high level summary of the changes:

Some issues that emerged while refactoring: #24, #25

This PR in no way influences fetching sequences.

A simplified diagram of the new workflow: new_metadata_flow

misialq commented 3 years ago

Done, review away! As discussed IRL, probably easier to review as if it was an entirely new thing. Also, please try it out with some run ids!

misialq commented 3 years ago

Hey @adamovanja, thanks for the review! The PR is now updated as per your comments, where applicable.

LenaFloerl commented 3 years ago

Hi - I tested several Run Acc. No. (also mixed from different BioProjects) and it works like a Gem. Using Sample IDs gives a error message as expected.

As discussed, it would be very useful to ultimately have the generated metadata file transformed, so that it can directly be used in a QIIME workflow. So far I was only able to use it after exporting it as a tsv.

misialq commented 3 years ago

As discussed, it would be very useful to ultimately have the generated metadata file transformed, so that it can directly be used in a QIIME workflow. So far I was only able to use it after exporting it as a tsv.

That's awesome, thanks @LenaFloerl! Would you mind opening an issue describing with a bit more detail what is the exact behaviour you'd like? Eg., some example of an action where this metadata artifact would be used would be very helpful. Danke!

nbokulich commented 3 years ago

it would be very useful to ultimately have the generated metadata file transformed, so that it can directly be used in a QIIME workflow

It looks like the necessary transformer is already in place... is this not working?

https://github.com/bokulich-lab/q2-fondue/blob/main/q2_fondue/types/_transformer.py#L31-L35

LenaFloerl commented 3 years ago

it would be very useful to ultimately have the generated metadata file transformed, so that it can directly be used in a QIIME workflow

It looks like the necessary transformer is already in place... is this not working?

https://github.com/bokulich-lab/q2-fondue/blob/main/q2_fondue/types/_transformer.py#L31-L35

Yes! I installed fondue in an existing qiime2 env and downloaded again with get-metadata. Using the metadata.qza now works fine, thanks @misialq!