ericgoolsby / Rphylopars

Phylogenetic Comparative Tools for Missing Data and Within-Species Variation
28 stars 11 forks source link

Imputation method for compositional data? #45

Closed diogoprov closed 3 years ago

diogoprov commented 3 years ago

Hi @ericgoolsby , I have a particular dataset in hand that is a compositional data, i.e., percentages of white blood cell types in 48 frog species. There're 30 species lacking data, this is the distribution of the data leuko_missing

I tested multiple models of trait evoltution and it seems that BM is the best one (less variation), but the problem is: the data for each species has to sum to 100%. This is because the data consists of percentages of each blood cell type (e.g., lymphocyte had 60%, neutrophils 10% and so on). Is there any way for me to impose some kind of contraint in the estimation under BM that says that, say, the rowSums of each species has to sum to 100%?

Thank you, Diogo

FVFaleiro commented 3 years ago

Hi @diogoprov that's an interesting question. How did you solve it? I think you could share the solution here to help others. Cheers!

diogoprov commented 3 years ago

Oh man, I realized that the error was becuase I was including one variable (total Leukocytes) that was not supposed to sum to 100%. Removing this variable makes the imputed dataset almost perfect, with only one or two species having more or less than 100%. So it seems Rphylopars works in a bunch of different and challenging scenarios. Changing from BM to OU or EB didn't improve the estimations of missing values anyway