fhcrc / deenurp

16S rRNA gene sequence curation and phylogenetic reference set creation
GNU General Public License v3.0
4 stars 3 forks source link

Some refactoring - deenurp gb2csv merging - branch: genbank_record_hashing #26

Closed crosenth closed 8 years ago

crosenth commented 9 years ago

Questions: 1) What does the date mean in a genbank record?

deenurp rdp_extract_genbank refactoring:

1) (any file with tax_id), ncbi_taxonomy.db or tax_table.csv -> seq_info.csv (with additional taxonomic info) 2) Add rank-specific taxonomic annotation to a csv file containing at least the column 'tax_id'" 3) inputs: records.csv (has at least tax_id), ncbi_taxonomy.db or taxonomy.csv 4) --rank list-of-ranks (default [species]) 5) -c/--check-classified: option to include *_classified as described below, default false 6) outputs: seq_info.csv with columns tax_id 7) tax_name # replaces tax_name in input if exists 8) is_classified 9) rank # of tax_id 10) for each rank: {rank}_id, {rank}_name, {rank}_classified # apply regex to name at this rank 11) add is_type to output of taxit annotate

crosenth commented 8 years ago

See issue deenurp issue #44