Closed misialq closed 3 years ago
Done, review away! As discussed IRL, probably easier to review as if it was an entirely new thing. Also, please try it out with some run ids!
Hey @adamovanja, thanks for the review! The PR is now updated as per your comments, where applicable.
Hi - I tested several Run Acc. No. (also mixed from different BioProjects) and it works like a Gem. Using Sample IDs gives a error message as expected.
As discussed, it would be very useful to ultimately have the generated metadata file transformed, so that it can directly be used in a QIIME workflow. So far I was only able to use it after exporting it as a tsv.
As discussed, it would be very useful to ultimately have the generated metadata file transformed, so that it can directly be used in a QIIME workflow. So far I was only able to use it after exporting it as a tsv.
That's awesome, thanks @LenaFloerl! Would you mind opening an issue describing with a bit more detail what is the exact behaviour you'd like? Eg., some example of an action where this metadata artifact would be used would be very helpful. Danke!
it would be very useful to ultimately have the generated metadata file transformed, so that it can directly be used in a QIIME workflow
It looks like the necessary transformer is already in place... is this not working?
https://github.com/bokulich-lab/q2-fondue/blob/main/q2_fondue/types/_transformer.py#L31-L35
it would be very useful to ultimately have the generated metadata file transformed, so that it can directly be used in a QIIME workflow
It looks like the necessary transformer is already in place... is this not working?
https://github.com/bokulich-lab/q2-fondue/blob/main/q2_fondue/types/_transformer.py#L31-L35
Yes! I installed fondue in an existing qiime2 env and downloaded again with get-metadata. Using the metadata.qza now works fine, thanks @misialq!
Disclaimer: this is a nasty but necessary PR.
This PR introduces a more-or-less complete refactor of the
_efetch.py
submodule. Rather than extract metadata from the NCBI response in an unstructured fashion, we can make use of the hierarchical structure of NCBI's SRA objects: study/sample/experiment/run. This will not only make the code easier to read but also will simplify handling of all the different metadata levels. It also prepares the fondue for the next PR on fetching by project ID (#13).Here's a high level summary of the changes:
_sra_meta.py
submodule containing data classes for storing metadata of a specific SRA object and reflect how those objects are related to one another; moreover functions for retrieving metadata from those objects and their children are included to make creation of the final metadata DataFrame simpler_efetch.py
submodule to make use of the new objects described aboveSome issues that emerged while refactoring: #24, #25
This PR in no way influences fetching sequences.
A simplified diagram of the new workflow: