bokulich-lab / q2-types-genomics

QIIME 2 types for genomics plugins.
BSD 3-Clause "New" or "Revised" License
6 stars 11 forks source link

ENH: add a `genome_dict` method on all `GenomeData`-linked formats #65

Open misialq opened 11 months ago

misialq commented 11 months ago

Is your feature request related to a problem? Please describe. Different variations of the GenomeData type (Proteins, Genes, Loci) store the data in fasta/gff files where names end with different suffixes (e.g.: _proteins.fasta for proteins or _loci.gff for loci). It would be handy to have a way to easily retrieve feature/genome IDs without the need to parse the names (in a similar way as is described in https://github.com/bokulich-lab/q2-types-genomics/issues/56).

Describe the solution you'd like Let's add a genome_dict method similar to how it was done in https://github.com/bokulich-lab/q2-types-genomics/pull/57 so that one can easily retrieve feature IDs from any GenomeData artifact.

Describe alternatives you've considered An alternative solution could be to remove the suffix completely but this would need adjusting the actions which already use that type (one of them being get-ncbi-genomes in RESCRIPt) and could potentially cause issues with artifacts which were created before.