Questions: 1) What does the date mean in a genbank record?

deenurp rdp_extract_genbank refactoring:

[x] expand is_type to identify type strains more broadly (like "ATCC" in name) - email Dhruba for other criteria - see http://www.ncbi.nlm.nih.gov/refseq/targetedloci/ and http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4383940/ (Thanks Dan Hoogestraat)
[x] gb files --> seq_info.csv, seq_info_references.csv, references.csv records.csv will include is_classified column based on regexp of description and/or specific features. Leave regex in taxtastic
[ ] add column 'origin' (ncbi, rdp, etc)
deenurp sequence_filter (leave rdp_sequence_filter in place for now with deprecation warning)
[x] change name to partition_refs
[x] input: seq_info.csv, seqs.fasta, seq_info_references.csv, references.csv; output: seqs.fasta and seq_info.csv for each of named and unnamed (same output options as rdp_sequence_filter)
[x] deenurp rdp_extract_genbank is obsolete
(different project) taxtastic.taxit annotate -

1) (any file with tax_id), ncbi_taxonomy.db or tax_table.csv -> seq_info.csv (with additional taxonomic info) 2) Add rank-specific taxonomic annotation to a csv file containing at least the column 'tax_id'" 3) inputs: records.csv (has at least tax_id), ncbi_taxonomy.db or taxonomy.csv 4) --rank list-of-ranks (default [species]) 5) -c/--check-classified: option to include *_classified as described below, default false 6) outputs: seq_info.csv with columns tax_id 7) tax_name # replaces tax_name in input if exists 8) is_classified 9) rank # of tax_id 10) for each rank: {rank}_id, {rank}_name, {rank}_classified # apply regex to name at this rank 11) add is_type to output of taxit annotate

fhcrc / deenurp

Some refactoring - deenurp gb2csv merging - branch: genbank_record_hashing #26

deenurp rdp_extract_genbank refactoring:

deenurp sequence_filter (leave rdp_sequence_filter in place for now with deprecation warning)

(different project) taxtastic.taxit annotate -