EcologyR / BlueCarbon

Estimation of organic carbon stocks and sequestration rates from soil/sediment cores from blue carbon ecosystems
https://ecologyr.github.io/BlueCarbon/
Other
2 stars 0 forks source link

Minium number of samples to estimate om-oc models #44

Closed NPJuncal closed 4 months ago

NPJuncal commented 10 months ago

In the function estimate_oc there is a minimum number of samples with organic matter and organic carbon data to estimate a model. This minimun is now 10 samples. Do you think this minimun is to high? Is it posible that someone collect a couple cores, end up with 20 om samples and only measure oc in 5 of them? Should we allow it? Would the model be trust worthy with this few samples?

MarcioFCMartins commented 10 months ago

I think that using 10 is a reasonable minimum standard for now.

Something that could also be interesting is to see if the data used to model the OC approximately covers the full range where we want to predict? For example, if your core has samples with OM ranging from 5 to 20%, but you only measured OC in samples from 10% to 15%, the model will have to extrapolate quite a bit and it's immediately less reliable.

Do you think it's worth pursuing?

costavale commented 10 months ago

I think we should suggest in the "vignette" or in the documentation, that the minimum number of samples is 10. However, we could allow a lower number of samples with a warning

NPJuncal commented 10 months ago

About the number of samples. Right now the function do not fit models if there are less than 10 samples. We can allow it and give a warning (although I wouldn't fit a model with less than 5) or allow the user to "turn off" the 10 sample limit (with a true or false statement in the function, maybe?)

The second solution would be more difficult to implement.... the first one is quite straight forward I think.

about the range of the model and the predictions. I think it would be interesting to add a warning. Not sure how to approach it... maybe add the minimum and maximum input om value of each model in a df and then check if the values used to estimate the oc in each core are between them....

Do you think this two enhancements are important for the first version of the package?

Julenasti commented 10 months ago

I think a warning when the sample size is small suggesting that the results might not be reliable will work as lme4, glmmTMB, etc. packages do. Easy to apply and effective I guess.

For range of data covered, I agree that it's a good idea to add a warning there too. The approach you propose Nerea seems good to me. It's also quite simple to apply.

So I'll try to apply both for the first version as they don't require much work and can be very useful.

MarcioFCMartins commented 10 months ago

I also agree with the approach suggested.

On Fri, 19 Jan 2024 at 09:34, Julen Astigarraga @.***> wrote:

I think a warning when the sample size is small suggesting that the results might not be reliable will work as lme4, glmmTMB, etc. packages do. Easy to apply and effective I guess.

For range of data covered, I agree that it's a good idea to add a warning there too. The approach you propose Nerea seems good to me. It's also quite simple to apply.

So I'll try to apply both for the first version as they don't require much work and can be very useful.

— Reply to this email directly, view it on GitHub https://github.com/EcologyR/BlueCarbon/issues/44#issuecomment-1900061098, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIKR7TGJ3OLFIJ3DTOUIUYDYPI4ZBAVCNFSM6AAAAABB3GWSCCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBQGA3DCMBZHA . You are receiving this because you were assigned.Message ID: @.***>

NPJuncal commented 10 months ago

Ok, I will add this two enhancements (change the 10 samples limit for a warning and check that the om provided is within the range of the model) as task in the function issue (#38 )