LangilleLab / microbiome_helper

A repository of bioinformatic scripts, SOPs, and tutorials for analyzing microbiome data.
GNU General Public License v3.0
433 stars 205 forks source link

added fix_ITS2_spf.py #6

Closed gavinmdouglas closed 7 years ago

gavinmdouglas commented 7 years ago

This script can be used on SPF files produced by the ITS2 pipeline to avoid a STAMP error when reading the file.

The error is related to reading the profile file, for example: "Data does not form a strict hierarchy. Child gunidentified has multiple parents (e.g. funidentified, f_Trichosporonaceae)".

The fix gets around this issue by converting all labels containing "unclassified" or "unidentified" (any case) to by the most recent higher-order taxonomic label that was defined with a number of "X" characters appended equal to the number of unclassified parents + 1.

mlangill commented 7 years ago

I think this would be better suited to be embedded within the biom_to_stamp.py script which already handles regular OTU BIOM files and through a slight altering should be able to handle ITS2 taxonomy.