This script can be used on SPF files produced by the ITS2 pipeline to avoid a STAMP error when reading the file.
The error is related to reading the profile file, for example:
"Data does not form a strict hierarchy. Child gunidentified has multiple parents (e.g. funidentified, f_Trichosporonaceae)".
The fix gets around this issue by converting all labels containing "unclassified" or "unidentified" (any case) to by the most recent higher-order taxonomic label that was defined with a number of "X" characters appended equal to the number of unclassified parents + 1.
I think this would be better suited to be embedded within the biom_to_stamp.py script which already handles regular OTU BIOM files and through a slight altering should be able to handle ITS2 taxonomy.
This script can be used on SPF files produced by the ITS2 pipeline to avoid a STAMP error when reading the file.
The error is related to reading the profile file, for example: "Data does not form a strict hierarchy. Child gunidentified has multiple parents (e.g. funidentified, f_Trichosporonaceae)".
The fix gets around this issue by converting all labels containing "unclassified" or "unidentified" (any case) to by the most recent higher-order taxonomic label that was defined with a number of "X" characters appended equal to the number of unclassified parents + 1.