bokulich-lab / RESCRIPt

REference Sequence annotation and CuRatIon Pipeline
BSD 3-Clause "New" or "Revised" License
84 stars 26 forks source link

BUG: gtdb taxonomy parser #191

Closed nbokulich closed 3 weeks ago

nbokulich commented 3 weeks ago

Evidently #169 introduced a bug into the way taxonomy is parsed from GTDB files. Taxonomy labels were being split on whitespace, so only the genus name was listed under the species rank.

This fixes this issue by also including the species label, and introduces a simple test for the parser, just in case the taxonomy format changes in the future.

I manually tested and confirmed that this works with versions 202, 214, 220. The taxonomy format is consistent — see the test for the structure.

FYI @mikerobeson

nbokulich commented 3 weeks ago

is it worth patching the latest release? @lizgehret ?