SofieVG / FlowSOM

Using self-organizing maps for visualization and interpretation of cytometry data
61 stars 26 forks source link

Is the integer cast in abs() really intended? #42

Closed LTLA closed 3 years ago

LTLA commented 3 years ago

For various reasons, I am writing a C++ implementation of som.c, and I noticed

https://github.com/SofieVG/FlowSOM/blob/6f79b8b7afa1f2996dbeae4292bbbfb93e92b3a1/src/som.c#L133-L134

tmp is double-precision but abs() takes integer arguments. This call will truncate tmp to integer before actually computing the absolute difference. Nothing will be added to change if all tmp lie in (-1, 1). As it stands now, the iterations will terminate if all tmp have absolute values less than 1. Is this intentional? Did you mean to use fabs() instead?

Regardless, I don't understand the motivation behind the stop condition of change < 1. A threshold of 1 seems pretty arbitrary when the scale of the input data could be anything. Why not just terminate when the change in the code coordinates drops below a certain level (e.g., defined as some small fraction of the standard deviation across codes)?

SamGG commented 3 years ago

Interesting point. I don't think there is such a stop condition in the implementation of the kohonen package. https://github.com/rwehrens/kohonen/blob/28eb3599bdc423e94a4a1e09310ffd9de667ead8/kohonen/src/supersom.cpp#L93-L173

SofieVG commented 3 years ago

Dear @LTLA,

This integer cast was indeed not intended and has already been changed to fabs() on my local installation. Apparently that update hasn't made it to this github yet, but I will update soon.

The idea behind the change parameter is indeed to check for convergence, but I agree that the value 1 can be kind of arbitrary, especially when the package is getting wider use. I think we choose this value back then because it made sense for the datasets we were working with at the time. I'll consider updating this in the future, either at least make it a parameter, or indeed, as you are suggesting, computing it based on the data range itself.

Kind regards, Sofie

LTLA commented 3 years ago

Okay, good to see that it's fixed.