[x] gb files --> seq_info.csv, seq_info_references.csv, references.csv
records.csv will include is_classified column based on regexp of description and/or specific features. Leave regex in taxtastic
[ ] add column 'origin' (ncbi, rdp, etc)
deenurp sequence_filter (leave rdp_sequence_filter in place for now with deprecation warning)
[x] change name to partition_refs
[x] input: seq_info.csv, seqs.fasta, seq_info_references.csv, references.csv; output: seqs.fasta and seq_info.csv for each of named and unnamed (same output options as rdp_sequence_filter)
[x] deenurp rdp_extract_genbank is obsolete
(different project) taxtastic.taxit annotate -
1) (any file with tax_id), ncbi_taxonomy.db or tax_table.csv -> seq_info.csv (with additional taxonomic info)
2) Add rank-specific taxonomic annotation to a csv file containing at least the column 'tax_id'"
3) inputs: records.csv (has at least tax_id), ncbi_taxonomy.db or taxonomy.csv
4) --rank list-of-ranks (default [species])
5) -c/--check-classified: option to include *_classified as described below, default false
6) outputs: seq_info.csv with columns tax_id
7) tax_name # replaces tax_name in input if exists
8) is_classified
9) rank # of tax_id
10) for each rank: {rank}_id, {rank}_name, {rank}_classified # apply regex to name at this rank
11) add is_type to output of taxit annotate
Questions: 1) What does the date mean in a genbank record?
deenurp rdp_extract_genbank refactoring:
deenurp sequence_filter (leave rdp_sequence_filter in place for now with deprecation warning)
(different project) taxtastic.taxit annotate -
1) (any file with tax_id), ncbi_taxonomy.db or tax_table.csv -> seq_info.csv (with additional taxonomic info) 2) Add rank-specific taxonomic annotation to a csv file containing at least the column 'tax_id'" 3) inputs: records.csv (has at least tax_id), ncbi_taxonomy.db or taxonomy.csv 4) --rank list-of-ranks (default [species]) 5) -c/--check-classified: option to include *_classified as described below, default false 6) outputs: seq_info.csv with columns tax_id 7) tax_name # replaces tax_name in input if exists 8) is_classified 9) rank # of tax_id 10) for each rank: {rank}_id, {rank}_name, {rank}_classified # apply regex to name at this rank 11) add is_type to output of taxit annotate