bokulich-lab / RESCRIPt

REference Sequence annotation and CuRatIon Pipeline
BSD 3-Clause "New" or "Revised" License
89 stars 26 forks source link

ENH: add action for `get_gg_data` #124

Closed colinbrislawn closed 2 years ago

colinbrislawn commented 2 years ago

I'm not sure how much this is needed, but is a compliment to #123 ENH: add action for get_unite_data

GreenGenes is distributed from ftp://greengenes.microbio.me/greengenes_release/ with a predictable naming scheme

Downsteam, this would replace https://github.com/caporaso-lab/pretrained-feature-classifiers/blob/master/get_gg.sh#L14-L15

nbokulich commented 2 years ago

such an action was never added for a reason... GG has not be updated for close to a decade now, so supporting it in RESCRIPt simply is not as high of a priority as supporting other databases with more frequent release cycles, e.g., SILVA (already supported), UNITE, GTDB, etc...

My recollection from looking at file download stats a while back found that GG was not downloaded as much as some other databases, so I just see a lower juice:squeeze ratio in adding such an action. But I am not opposed.

@thermokarst @mikerobeson what do you think? Would supporting GG in RESCRIPt add value?

colinbrislawn commented 2 years ago

juice:squeeze 🍋🗜️

Yep. I was looking for a simple place to start with RESCRIPt before also using the PlutoF API.

I'm avoiding the hard problem, lol. Feel free to close if unneeded

thermokarst commented 2 years ago

@thermokarst @mikerobeson what do you think? Would supporting GG in RESCRIPt add value?

Hmm, probably not. The download stats are a bit inflated for GG, too, since most of the User doc tutorials use GG for its smaller size - so those classifiers get downloaded all the time when building docs, etc.

mikerobeson commented 2 years ago

I agree with you both @thermokarst & @nbokulich. Not worth adding IMO. But it might be something for a "good first issue" tag or something. 🤷