Is the breadth -% transcriptome expressed in a sample- and the geometric mean -central tendency of a set of numbers- good measurements of the abundance of an organism?

We are interested into understanding how these measurements work out and its relationship with other proxies of abundance such as the 18S rRNA gene abundance.

Let's start with

The beautiful

Pelagomonas calceolata is ranked #45 in the 18S abundance table, and therefore its values will be quite representative of good distributions.

By checking the breadth:

We see that it reaches a limit, the natural one of 100% transcriptome transcribed.

By checking the geometric mean this does not appear:

The relationship seems to be quite linear at the logarithmic scale. It is not so clear when we look at it at the normal scale, since the geometric and the V4 counts can be quite high at specific moments.

The middle situation

With our beloved Florenciella, since it is not as abundant as Pelagomonas, the distribution is not so clear:

There is an increase but it's more a burst that a linear relationship, only appearing when the abundance is big enough and the transcriptome is able to capture efficiently the true expression.

The ugly

With organisms presenting a lower value, the distribution is even more clear. Using the amoeba from Alex:

In this case the relationship is not clear at all. This could come from multiple reasons. Lack of sequencing depth, that the 18S presents biases...

Conclusions

I will check how the housekeeping genes covary over the general geometric mean to see if there is this relationship or not, and if the relationship is structured in a similar fashion to the correlations we observe with the analysis performed.

beaplab / transcriptome_metaT_quantification

Geometric mean as a predictor of abundance for a transcriptome quantification #4

The beautiful

The middle situation

The ugly

Conclusions