bowmanjeffs / paprica

paprica - PAthway PRediction by phylogenetIC plAcement
26 stars 8 forks source link

analyzing edge_data #69

Closed GucciBawler closed 4 years ago

GucciBawler commented 4 years ago

Hi Jeff

I am using your software to analyze some 16S data of ours. More specifically I'm using it to determine the amount of bacteria for which no reference genome was found. For this I work with the map_ratio of the edge_data file, but this is not given for all nodes. Why is this? I've seen that the map_ratio not showing is correlated with the clade_size being bigger than 1, so I was also wondering what clade_size means and how these properties are related to each other.

Thanks in advance

bowmanjeffs commented 4 years ago

Map ratio is only given for terminal nodes on the reference tree. There's no way to calculate it for internal nodes, as there is no reference 16S rRNA gene associated with those positions. Terminal nodes all have a clade size of 1, with anything > 1 belonging to an internal node by definition.