EcologyR / BlueCarbon

Estimation of organic carbon stocks and sequestration rates from soil/sediment cores from blue carbon ecosystems
https://ecologyr.github.io/BlueCarbon/
Other
2 stars 0 forks source link

function estimate_oc #38

Open NPJuncal opened 9 months ago

NPJuncal commented 9 months ago

Estimation of organic Carbon is done by means of linear regressions on log(organic carbon) ~ log(organic matter). It gives back a organic carbon value for each organic matter value provided. If there is a organic carbon value for that sample it return the same value, else, generates a model for that site, else, model for specie, else, model for Ecosystem. If a model can not be created due to the low number of samples (<10) it uses the equations in Howard et al. 2014 to estimate the organic carbon.

outputs: The initial tibble or data.frame with three new columns:

Works with example data but there is at least one bug.

tasks:

-the function does not work if the oc column is empty. The function estimate several models and then check p values and r2 to see if they are good to use. If there are not good models utilize published equations to estimate oc. But if the oc column is empty it can not estimate any model and it breaks. In this case it should go directly to to the published equations but it does not.

-change equation to estimate oc in salt marsh data if models can not be built. Now it comes from Craft et al. 1991. New equation in Maxwell et al. (2023). https://doi.org/10.1038/s41597-023-02633-x

-add a new output: df with the parameters of the models estimated

-write more test (only a few to check that the function checks type of data provided (numeric vs not numeric), but not about the funtionality of the function itself

-check documentation

enhancement 1: the function divide the df by ecossytem, dominant species and station. Due to this, it is compulsory to provide three columns with ecosystem, species and station. The user may not have diferent species or stations. Allow user to chose if it want this groups or not. Ecosystem column need to be there anyway just in case the models are no good and need to use published functions.

NPJuncal commented 9 months ago

@costavale I have tried the function with a database that had the oc column empty and it work perfectly using the literature functions... could you send me the dataframe you try to use to check what the problem was?

NPJuncal commented 9 months ago

Craft et al 1991 function already changed for Maxwell et al 2023

NPJuncal commented 9 months ago

updated tasks:

-check function with @costavale data. The function does work if the oc column is empty. This was not the problem.

-Change 10 sample limit for a warning (see issue #44 ). Right now the function does not fit models if there are less than 10 samples. We can allow it and give a warning. Not all models will be used (models with low r2 or p value are deleted). Need to save which models had less than 10 samples and check with final database if any of this model were used and provide the names of the cores in the warning.

-Check if the om values used to estimate oc are within the range of the samples used to fit the models (see issue #44 ). Possible approach: add the minimum and maximum input om value of each model in a df and then check if the values used to estimate the oc in each core are between them.

-write more test (only a few to check that the function checks type of data provided (numeric vs not numeric), but not about the funtionality of the function itself

-check documentation

enhancement 1: the function divide the df by ecossytem, dominant species and station. Due to this, it is compulsory to provide three columns with ecosystem, species and station. The user may not have diferent species or stations. Allow user to chose if it want this groups or not. Ecosystem column need to be there anyway just in case the models are no good and need to use published functions.

NPJuncal commented 8 months ago

updated tasks:

-check function with @costavale data. The function does work if the oc column is empty. This was not the problem.

-write more test (only a few to check that the function checks type of data provided (numeric vs not numeric), but not about the funtionality of the function itself

-check documentation

enhancement 1: the function divide the df by ecossytem, dominant species and station. Due to this, it is compulsory to provide three columns with ecosystem, species and station. The user may not have diferent species or stations. Allow user to chose if it want this groups or not. Ecosystem column need to be there anyway just in case the models are no good and need to use published functions.

NPJuncal commented 8 months ago

I have solved three bugs I found while using this function with new data:

fit_models would not fit site models if there was only one specie per station choose model function should not work if site_models or species_models were null it the input dataset has a column name as one of the temporal columns created by any of the functions they would break

I have already fixed all three. Maybe those were the one @costavale found.

costavale commented 8 months ago

Hi all, I am going to check these updates and test again the functions with that dataset. Regarding the rest, how did we split the work? I am sorry I missed some of the previous updates

Julenasti commented 8 months ago

Yes sorry, I'm a little lost right now. Nerea, if you think I can help you with a specific job let me know. I think I'll be more helpful if the issues you assign me are code related - if you have any

NPJuncal commented 7 months ago

This function is huge and writing test that would test specific steps is a bit messi. I have decided to write test for each of the small functions within it. For example: test_that("fit_ecosystem_model function works as expected", {

what do you think @Pakillo ? Maybe I should put all test in the same script insted of opening a new script for each function?

NPJuncal commented 3 months ago

The function crashes Furthermore, change the Kauffman equation for new equation