ccdmb / catastrophy

CATAStrophy predicts filamentous plant pathogen lifestyle characteristics based on their CAZyme composition.
BSD 3-Clause "New" or "Revised" License
5 stars 0 forks source link

Questionable results #3

Open cahuparo opened 1 year ago

cahuparo commented 1 year ago

Hi there,

I really like this concept. However, I am trying to make sense of my results. After plotting PC1 and PC2 for Ceratocystis fimbriata proteome. I observed clustering with very un expected taxa and the prediction as a SYMBIONT. This is concerning but I am happy to try something else.

I used both v7 and v10 databases and got the same result.

Any suggestion would be appreciated!

Best,

Camilo

Screen Shot 2023-01-03 at 8 46 20 AM

darcyabjones commented 1 year ago

Hi Camilo,

Sorry for the late reply, i've been away for a while. James would be the best person to talk to about interpreting the results etc, the concept and model design was theirs. Send them an email (james.hane@curtin.edu.au) as they don't really use github.

As a quick response, yes i agree that it's odd for this to be clustering with the biotrophs. From a techical point of view the only thing that could artificially bias the results is if you have a particularly large genome/proteome, as the CAZyme counts weren't scaled before the PCA and many of the biotrophs used have bigger genomes. That's assuming that I haven't made a mistake somewhere on the software side, but i think it's all ironed out now. Otherwise the model really just summarises data, so what you're seeing is a genuine similarity to the biotrophs in terms of CAZyme content.

My initial thought would be that potentially there is an expansion (or depletion) of some CAZyme families more commonly associated with biotrophy. I'd be looking at the PCA loadings and CAZyme counts of those species and yours to see what is causing them to be placed close together. James should be able to help you do this (I don't work for them anymore).

All the best, Darcy