Automatic taxonomy string extraction

caporaso-lab / mockrobiota

A public resource for microbiome bioinformatics benchmarking using artificially constructed (i.e., mock) communities.

http://mockrobiota.caporasolab.us

BSD 3-Clause "New" or "Revised" License

77 stars 35 forks source link

Automatic taxonomy string extraction #46

Open nbokulich opened 7 years ago

nbokulich commented 7 years ago

Add code to convert "source" taxonomy files to expected-taxonomy.tsv by extracting full-length taxonomy strings from reference database X.

Similarly, to extract database identifiers, e.g., from GenBank.

The issues with both of these is that manual curation is still very much needed and database quality can be a major issue. But the first would be approachable and would streamline the process of creating these files.