brianstock / MixSIAR

A framework for Bayesian mixing models in R:
http://brianstock.github.io/MixSIAR/
90 stars 75 forks source link

Testing different TEF values #223

Closed mcaputo3 closed 4 years ago

mcaputo3 commented 4 years ago

Good morning!

I'm running MixSIAR for some dolphin data where the best discrimination factor is still up for debate. I've used three different discrimination factors to run the models, but these greatly affect the dietary contribution of each prey item. Normally for cetaceans people will then take an average for the three different models and present that but the big proportional differences make me wary of doing that. Can I compare these three different outputs using DIC or LOO/WAIC as you do when looking at the best model based on adding different factors? It seems to me that it would work but I don't want to go ahead in my manuscript without some expert advice (ecologists doing statistics are not statisticians after all).

Thanks!

AndrewLJackson commented 4 years ago

Hi

Not sure what the others think but im very wary about using any sort of model comparison tool to choose which TDF or to inform the proportions. In an extreme example, you could simpkly let the TDF fly between large values and the model would likely settle on a propoprtion of 1 for the nearest food source with a TDF equal to the difference between the consumer(s) and that food source. These models are attempting to use a defined process to say something about a hidden variable (diet) and so really one should have to choose the parameters that govern the behaviour of the processes and stick with them. If one really did not know what a parameter was one should pick a prior distribution that encompasses all your estimates and present the results as the come out no matter how wide the diet intervals are.

So if I had three possible disributions for my TDF that might be Norm(mu = 5,sigma = 2), Norm(6, sigma = 1.5) and Norm(4, sigma = 1.8) then I would settle on Norm( mu = mean(6,6,4) , sigma = (2^2 + 1.5^2 + 1.8^2) ^ 0.5 ) rather than run the model three times with each and pick the one that looks best… which is what any sort of DIC approach would do.

Best wishes Andrew

--

Dr Andrew Jackson, PhD, FTCD Associate Professor Irish Research Council Laureate School of Natural Sciences, Department of Zoology Trinity College Dublin, the University of Dublin Dublin 2, Ireland.

+353 1 896 2728 | a.jackson@tcd.iemailto:a.jackson@tcd.ie Twitter: @yodacomplexhttps://twitter.com/yodacomplex http://www.tcd.ie/Zoology/research/http://www.tcd.ie/Zoology/research/groups/jackson/groups/jackson/

From: mcaputo3 notifications@github.com Reply to: brianstock/MixSIAR reply@reply.github.com Date: Monday 4 May 2020 at 11:29 To: brianstock/MixSIAR MixSIAR@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [brianstock/MixSIAR] Testing different TEF values (#223)

Good morning!

I'm running MixSIAR for some dolphin data where the best discrimination factor is still up for debate. I've used three different discrimination factors to run the models, but these greatly affect the dietary contribution of each prey item. Normally for cetaceans people will then take an average for the three different models and present that but the big proportional differences make me wary of doing that. Can I compare these three different outputs using DIC or LOO/WAIC as you do when looking at the best model based on adding different factors? It seems to me that it would work but I don't want to go ahead in my manuscript without some expert advice (ecologists doing statistics are not statisticians after all).

Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/brianstock/MixSIAR/issues/223, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAZLLMBB5RMUBDG7GPQ5PQDRP2KHPANCNFSM4MYUJ4OA.

mcaputo3 commented 4 years ago

Hi Andrew,

Thanks for your advice, I’m glad I asked!

As you likely know - it’s complicated to find the appropriate TDF when feeding studies have not been conducted for the consumer, so often cetacean biologists use 2-4 different TDFs and average the proportions. Or they simply chose which they think will work best. I’m not sure that either of these is a great solution. Especially for my species.

While in principle I understand what "Norm( mu = mean(6,6,4) , sigma = (2^2 + 1.5^2 + 1.8^2) ^ 0.5 )” means, I don’t quite know how to use that info for running the model. i.e. what my files should look like given that currently you put in the means/SD to use for discr.?.

Apologies if I’m being obtuse - first time using MixSIAR and doing mixing models so trying to navigate a new world!

Thanks Michelle

On May 4, 2020, at 1:39 PM, Andrew Jackson notifications@github.com wrote:

Hi

Not sure what the others think but im very wary about using any sort of model comparison tool to choose which TDF or to inform the proportions. In an extreme example, you could simpkly let the TDF fly between large values and the model would likely settle on a propoprtion of 1 for the nearest food source with a TDF equal to the difference between the consumer(s) and that food source. These models are attempting to use a defined process to say something about a hidden variable (diet) and so really one should have to choose the parameters that govern the behaviour of the processes and stick with them. If one really did not know what a parameter was one should pick a prior distribution that encompasses all your estimates and present the results as the come out no matter how wide the diet intervals are.

So if I had three possible disributions for my TDF that might be Norm(mu = 5,sigma = 2), Norm(6, sigma = 1.5) and Norm(4, sigma = 1.8) then I would settle on Norm( mu = mean(6,6,4) , sigma = (2^2 + 1.5^2 + 1.8^2) ^ 0.5 ) rather than run the model three times with each and pick the one that looks best… which is what any sort of DIC approach would do.

Best wishes Andrew

--

Dr Andrew Jackson, PhD, FTCD Associate Professor Irish Research Council Laureate School of Natural Sciences, Department of Zoology Trinity College Dublin, the University of Dublin Dublin 2, Ireland.

+353 1 896 2728 | a.jackson@tcd.iemailto:a.jackson@tcd.ie Twitter: @yodacomplexhttps://twitter.com/yodacomplex http://www.tcd.ie/Zoology/research/http://www.tcd.ie/Zoology/research/groups/jackson/groups/jackson/

From: mcaputo3 notifications@github.com Reply to: brianstock/MixSIAR reply@reply.github.com Date: Monday 4 May 2020 at 11:29 To: brianstock/MixSIAR MixSIAR@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [brianstock/MixSIAR] Testing different TEF values (#223)

Good morning!

I'm running MixSIAR for some dolphin data where the best discrimination factor is still up for debate. I've used three different discrimination factors to run the models, but these greatly affect the dietary contribution of each prey item. Normally for cetaceans people will then take an average for the three different models and present that but the big proportional differences make me wary of doing that. Can I compare these three different outputs using DIC or LOO/WAIC as you do when looking at the best model based on adding different factors? It seems to me that it would work but I don't want to go ahead in my manuscript without some expert advice (ecologists doing statistics are not statisticians after all).

Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/brianstock/MixSIAR/issues/223, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAZLLMBB5RMUBDG7GPQ5PQDRP2KHPANCNFSM4MYUJ4OA. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/brianstock/MixSIAR/issues/223#issuecomment-623413321, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOXLRSX5VOYDKJZ4LBGENGLRP2SP5ANCNFSM4MYUJ4OA.

brianstock commented 4 years ago

The reason this is different is because the TDF is treated as data (modifies your source data), while the in/exclusion of covariates/factors is a question of whether there is support for more or fewer parameters. See Burnham and Anderson (2002), sec 2.11.1:

An important issue, in general, is that the data and their exact representation must be fixed and alternative models fitted to this fixed data set. Information criteria should not be compared across different data sets, because the inference is conditional on the data in hand.

Another way of interpreting Andrew's recommendation: it considers the 3 TDFs as random variables and uses the average. Then, the TDF mean = mean of the means, and the TDF variance = sum of the variances, per properties of the sum of random variables. This seems good to me because it appropriately increases the TDF uncertainty (variance). In Andrew's ex, the resultant TDF sd is 3.08, quite a bit higher than any of the individual TDF sds (2, 1.5, and 1.8). And as Andrew says, this will (appropriately) increase the CIs for the diet proportions.

I would think this will also increase the influence of the prior on the diet proportions, since it tells the model your data are less informative. So you may want to also think about using different priors, and what they mean (e.g. generalist/uninformative vs. previous studies vs. other data types).

I don't think presenting the average of three models using different TDF gives you the same biological inference as presenting the three separately and saying you don't have evidence to say which is correct... Ex: model with TDF 1 says they eat primarily prey A, model 2 says they eat B, and model 3 says they eat C (all with relatively tight CIs). Concluding that they eat an even mix of A, B, and C is different than saying they may eat primarily A, B, or C but we can't tell from the SI data.

mcaputo3 commented 4 years ago

Thanks so much to you both - really appreciate you walking me through it.

Will proceed accordingly!

Cheers Michelle

On May 4, 2020, at 4:04 PM, Brian Stock notifications@github.com wrote:

The reason this is different is because the TDF is treated as data (modifies your source data), while the in/exclusion of covariates/factors is a question of whether there is support for more or fewer parameters. See Burnham and Anderson (2002) https://www.springer.com/gp/book/9780387953649, sec 2.11.1:

An important issue, in general, is that the data and their exact representation must be fixed and alternative models fitted to this fixed data set. Information criteria should not be compared across different data sets, because the inference is conditional on the data in hand.

Another way of interpreting Andrew's recommendation: it considers the 3 TDFs as random variables and uses the average. Then, the TDF mean = mean of the means, and the TDF variance = sum of the variances, per properties of the sum of random variables http://www.stat.yale.edu/Courses/1997-98/101/rvmnvar.htm. This seems good to me because it appropriately increases the TDF uncertainty (variance). In Andrew's ex, the resultant TDF sd is 3.08, quite a bit higher than any of the individual TDF sds (2, 1.5, and 1.8). And as Andrew says, this will (appropriately) increase the CIs for the diet proportions.

I would think this will also increase the influence of the prior on the diet proportions, since it tells the model your data are less informative. So you may want to also think about using different priors, and what they mean (e.g. generalist/uninformative vs. previous studies vs. other data types).

I don't think presenting the average of three models using different TDF gives you the same biological inference as presenting the three separately and saying you don't have evidence to say which is correct... Ex: model with TDF 1 says they eat primarily prey A, model 2 says they eat B, and model 3 says they eat C (all with relatively tight CIs). Concluding that they eat an even mix of A, B, and C is different than saying they may eat primarily A, B, or C but we can't tell from the SI data.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/brianstock/MixSIAR/issues/223#issuecomment-623483979, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOXLRSVUKGU6JE5TMK5I333RP3DPTANCNFSM4MYUJ4OA.