OpenDendro / dplR

This is the dev site for the dplR package in R
38 stars 14 forks source link

question about example in dplR::glk #26

Open vochr opened 1 week ago

vochr commented 1 week ago

Dear Mr. Bunn, Mr Zang and colleagues, I am happily using your R-package dplR since quite some time, but now I came across the example in dplR::glk and wonder whether the calculation of the mean glk is correctly specified therein. Especially, the returned glk_mat holds the matrix with Gleichläufigkeit between all time-series. In the example, the mean of that matrix is calculated, including the diag elements, which are (by definition) all 1, pulling up the resulting mean. Am I wrong assuming, that the mean should preferably considering only one of the upper or lower triangle of that matrix? Or more easily, setting the diag NA and proceeding as given in the example? Of course, in the given example, the difference is not too big (0.6764706 as compared to 0.662491), but in smaller data sets this might be of bigger importance.

original example

library(utils) data(ca533) ca533.glklist <- glk(ca533) mean(ca533.glklist$glk_mat, na.rm = TRUE)

proposal

data(ca533) glkMat <- glk(ca533)$glk_mat mean(glkMat[upper.tri(glkMat)], na.rm = TRUE)

in glk.legacy only the upper.tri is returned and used

glk.legacy(ca533) mean(glk.legacy(ca533), na.rm=TRUE)

best Christian

AndyBunn commented 6 days ago

Hi Ron, can you weigh in here?

From: vochr @.> Date: Monday, November 18, 2024 at 12:05 AM To: OpenDendro/dplR @.> Cc: Subscribed @.***> Subject: [OpenDendro/dplR] question about example in dplR::glk (Issue #26)

Dear Mr. Bunn, Mr Zang and colleagues, I am happily using your R-package dplR since quite some time, but now I came across the example in dplR::glk and wonder whether the calculation of the mean glk is correctly specified therein. Especially, the returned glk_mat holds the matrix with Gleichläufigkeit between all time-series. In the example, the mean of that matrix is calculated, including the diag elements, which are (by definition) all 1, pulling up the resulting mean. Am I wrong assuming, that the mean should preferably considering only one of the upper or lower triangle of that matrix? Or more easily, setting the diag NA and proceeding as given in the example? Of course, in the given example, the difference is not too big (0.6764706 as compared to 0.662491), but in smaller data sets this might be of bigger importance.

original example

library(utils) data(ca533) ca533.glklist <- glk(ca533) mean(ca533.glklist$glk_mat, na.rm = TRUE)

proposal

data(ca533) glkMat <- glk(ca533)$glk_mat mean(glkMat[upper.tri(glkMat)], na.rm = TRUE)

in glk.legacy only the upper.tri is returned and used

glk.legacy(ca533) mean(glk.legacy(ca533), na.rm=TRUE)

best Christian

— Reply to this email directly, view it on GitHubhttps://github.com/OpenDendro/dplR/issues/26, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AC7UCXN6KCQRY4SCNIGKMT32BGNUVAVCNFSM6AAAAABR7AX3KCVHI2DSMVQWIX3LMV43ASLTON2WKOZSGY3DONJXGU4TAOI. You are receiving this because you are subscribed to this thread.Message ID: @.***>

RonaldVisser commented 4 days ago

Hi Christian and Andy,

Thanks for pointing this out Christian! You are indeed correct, although you could argue for calculating the mean with including the self-comparisons. I would propose to update the example to:

data(ca533)
ca533.glklist <- glk(ca533)
ca533.glk_mat <- ca533.glklist$glk_mat
mean(ca533.glk_mat, na.rm = TRUE) # calculating the mean GLK including self-similarities
mean(ca533.glk_mat[upper.tri(ca533.glk_mat)], na.rm = TRUE) # calculating the mean GLK excluding self-similarities

I would also suggest to refrain from using the GLK at all and use the SGC instead. The latter is a better measure for the similarity, rather than including semi-synchronous growth changes as the GLK does, see my paper on this: https://doi.org/10.1111/arcm.12600 (open access).

Cheers,

Ronald

vochr commented 4 days ago

Dear Ronald, thanks for highlighting your paper on SGC! Sounds great.

Also, thanks for updating the example. I think this is fine. Both approaches are given, so people can see and choose.

Best Christian

Von: RonaldVisser @.> Gesendet: Mittwoch, 20. November 2024 09:21 An: OpenDendro/dplR @.> Cc: Vonderach, Christian (FORST) @.>; Author @.> Betreff: EXTERN: Re: [OpenDendro/dplR] question about example in dplR::glk (Issue #26)

Hi Christian and Andy,

Thanks for pointing this out Christian! You are indeed correct, although you could argue for calculating the mean with including the self-comparisons. I would propose to update the example to:

data(ca533)

ca533.glklist <- glk(ca533)

ca533.glk_mat <- ca533.glklist$glk_mat

mean(ca533.glk_mat, na.rm = TRUE) # calculating the mean GLK including self-similarities

mean(ca533.glk_mat[upper.tri(ca533.glk_mat)], na.rm = TRUE) # calculating the mean GLK excluding self-similarities

I would also suggest to refrain from using the GLK at all and use the SGC instead. The latter is a better measure for the similarity, rather than including semi-synchronous growth changes as the GLK does, see my paper on this: https://doi.org/10.1111/arcm.12600 (open access).

Cheers,

Ronald

— Reply to this email directly, view it on GitHubhttps://github.com/OpenDendro/dplR/issues/26#issuecomment-2487856678, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AP7FPHYLGCJ46XPGIKSEBBT2BRA77AVCNFSM6AAAAABR7AX3KCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOBXHA2TMNRXHA. You are receiving this because you authored the thread.Message ID: @.***>