BiologicalRecordsCentre / sparta

Species Presence/Absence R Trends Analyses
http://biologicalrecordscentre.github.io/sparta/index.html
MIT License
21 stars 24 forks source link

createWeights.R (for Frescalo) contains arguably inappropriate hard-coded distance function #240

Open sacrevert opened 2 years ago

sacrevert commented 2 years ago

Just noting that this function to create neighbourhood land cover-based weights uses dist() with default options. This is a Euclidean distance measure that is potentially inappropriate for very sparse matrices (because lots of shared zeros between items can have a strong influence on the distance measures -- a similar issue that often comes up in community ecology). A warning, and the option to use something like the cosine similarity measure, would be desirable. Efficient code for the latter is the second answer here: https://stats.stackexchange.com/questions/31565/compute-a-cosine-dissimilarity-matrix-in-r

AugustT commented 2 years ago

Thanks @sacrevert not something I had considered

sacrevert commented 2 years ago

Just noting here that I have created a quick tool for visualising Frescalo weights (https://github.com/sacrevert/visualiseFresNeighbours) and sets of new weights at various geographic scales (https://github.com/sacrevert/frescaloNeighbourhoods). It would be interesting to compare the existing approach in Sparta to these new sets that use newer land cover information, additional geological information, and the cosine similarity measure (rather than the Euclidean approach currently encoded in sparta)

sacrevert commented 2 years ago

Quick comparison here between the sparta approach and what I did. Doesn't actually make a great deal of difference in this case (although some neighbourhoods show differences, this is probably negligible for trend estimates, even if they are slightly more coherent ecologically); still, might be wise to give the user an option, or warning, with regards to the dissimilarity measure, as it could have bigger effects in other cases. See https://github.com/sacrevert/frescaloNeighbourhoods/blob/main/spartaWeightsComparison.pdf

AugustT commented 2 years ago

@sacrevert thank you for doing the comparison and taking the time to put together the PDF. Realistically, given other priorities, I don't see any changes being made to sparta's frescalo functionality in the near future. I'd be happy to review and pull in and changes that you want to make, but realise you may well not have the time either.

sacrevert commented 2 years ago

No worries. No, I probably won't have time either : )