Closed tobiasgf closed 6 months ago
That is probably fine. The question I got was actually about whether the BC dissimilarities were calculated on relative abundances. I believe we are calculating the BC based on the raw read counts, right? I may be best practice to convert to relative abundances (in sample) first. Would that be feasible?
It looks like we do this: https://github.com/gbif/edna-tool-ui/issues/2#issuecomment-1717637957 i.e.
taking the fourth root of each value (x^0.25) is a quick and acceptable solution.
That is to downweight the influence of high numbers. But to make sample-to-sample comparisons fair, the sampling effort (read counts per sample) need to be similar, that can be done by resampling (to even depth) or scaling. I suggest to do the last, by dividing the read counts by sample total read count (=relative abundances)
Isn´t this sufficient?