daisybio / namco

GNU General Public License v3.0
8 stars 2 forks source link

Taxonomy file: No name for rank vs unclassified #48

Closed Username-felix-is-not-available closed 8 months ago

Username-felix-is-not-available commented 11 months ago

Dear developers, I recently started to use Namco for my analyses and it appears to be very helpful. One thing that I am not completely clear on, is the exact format of the taxonomy file for some edge cases. I understand that the general format should look like this (tab-delimited)

Taxa Kingdom Phylum Class Order Family Genus Species
OTU1 K1 P1 C1 O1 F1 G1 S1

The full documentation (https://docs.google.com/document/d/1A_3oUV7xa7DRmPzZ-J-IIkk5m1b5bPxo59iF9BgBH7I/edit?usp=sharing) is currently not available, so I checked the brief help (https://exbio.wzw.tum.de/namco/#shiny-tab-info). If I understand correctly, reads which could only be classified to (for example) the kingdom, should be assigned to the following OTU:

Taxa Kingdom Phylum Class Order Family Genus Species
OTU1 K1

All fields are still there, but after the kingdom, fields contain an empty value.

In the taxonomy that I use, an OTU can be classified down to the species rank, but some ranks in between don't have a name (name = ""). For this specific case, I am wondering, how I should format these missing names, so Namco can recognize them correctly. In my example, the OTU1 does not have a class name. Should the class be an empty string

Taxa Kingdom Phylum Class Order Family Genus Species
OTU1 K1 P1 O1 F1 G1 S1
or better a placeholder like "NA"? Taxa Kingdom Phylum Class Order Family Genus Species
OTU1 K1 P1 NA O1 F1 G1 S1

Have a nice day, Felix

alex-d13 commented 11 months ago

Hi Felix,

Thanks for using Namco :)

Thats actually a very good question. I would actually keep taxa levels, where you do not have a classification available, empty (i.e. fill thme wiht ""). You could also think about giving them the name of the next higher taxonomic level, for which a classification exists. This will be done by default also by our network methods in Namco.

Best, Alex

Username-felix-is-not-available commented 11 months ago
Hi Alex, Thank you for your swift reply! I noticed that both cases (no name at a certain rank vs. completely unclassified after a rank) are both displayed as "trace 0" in the taxonomic composition plot. Since I don't like this name so much, I would like to change the name of completely unclassified ranks from "" to "UNCLASSIFIED" and for the other case from "" to "NA". So it would look like this: Taxa Kingdom Phylum Class Order Family Genus Species
OTU1 K1 P1 NA O1 F1 G1 S1
OTU1 K1 UNCLASSIFIED UNCLASSIFIED UNCLASSIFIED UNCLASSIFIED UNCLASSIFIED UNCLASSIFIED

However, I was wondering, if this would bias the statistics like the differential analysis? Now, we would have two different names at class level "NA" and "UNMATCHED". Previously, we just had "trace 0". Best, Felix