bcgov / ssdtools

An R package to fit and plot Species Sensitivity Distributions (SSDs)
https://bcgov.github.io/ssdtools/
Apache License 2.0
31 stars 17 forks source link

Implement parametric bootstrapping for censored data #378

Open joethorley opened 2 months ago

joethorley commented 2 months ago

Currently can only be performed using parametric = FALSE

Zhenglei-BCS commented 2 weeks ago

Can ssdtools handle censored data currently?

joethorley commented 2 weeks ago

In short it can handle censored left or interval censored data if the distributions have the same number of parameters and non-parametric bootstrapping is used.

From the NEWS.md

ssdtools 2.0.0

Finally, with censored data confidence intervals can now only be estimated by non-parametric bootstrapping as the methods of parametrically bootstrapping censored data require review.

ssdtools 1.0.0

Censored Data

Censoring can now be specified by providing a data set with one or more rows that have

It is currently not possible to fit distributions to data sets that have

Rows that have a zero or missing value for the left column and an infinite or missing value for the right column (fully censored) are uninformative and will result in an error.

Akaike Weights

For uncensored data, Akaike Weights are calculated using AICc (which corrects for small sample size). In the case of censored data, Akaike Weights are calculated using AIC (as the sample size cannot be estimated) but only if all the distributions have the same number of parameters (to ensure the weights are valid).

Zhenglei-BCS commented 2 weeks ago

Thanks for pointing me to the information.

In our ecotoxicological studies we often encounter endpoints that exceed the highest tested concentration, which means the tested species is not sensitive to the test item.It has been recommended to include these censored values in the SSD analysis, following the approach outlined in http://arxiv.org/abs/1311.5772. This method essentially extends a maximum likelihood approach by incorporating ( P(X > C) ) into the objective function.

However, I found it very puzzling because in the extreme case, including a very insensitive species could misleadingly suggest the presence of a very sensitive species, given that distributions like the lognormal are symmetric after taking the logarithm. This is counter-intuitive and could potentially skew the results.

I will certainly need to read how it is handled in ssdtools. I would appreciated any insights or clarifications.

joethorley commented 1 week ago

What you describe is right censoring and it is not yet implemented in ssdtools. And yes you simply give it the information that the concentrations is greater than a particular value. I'm not sure why you think that including a very insensitive species could misleadingly suggest the presence of a very sensitive species?