Open tdhock opened 3 years ago
see https://github.com/tdhock/binsegRcpp/blob/master/src/interface.cpp for an alternative that uses names instead of numeric indices, and Rcpp::List for return value of the interface function. (Rcpp::DataFrame could be used as well)
btw for reading about why to avoid the "magic numbers" anti-pattern see https://en.wikipedia.org/wiki/Magic_number_(programming)#Unnamed_numerical_constants
I agree that it may be confusing. I definitely have to add some comments to the code so that it is clear why I did his.
At the beginning I though of using Rcpp::List
. However, this requires to define vectors for every slot that will be added to the list, and then pass references of the vector to the C++ code. This is unfeasible here because I don't know how many slots I will return to C++ (since different distributions have a different number of parameters changing). Therefore, I decided to use Rcpp::NumericMatrix
because I can store the data for several values (bestDecrease, invalidatesIndex, invalidatesAfter, mean, ...) while only defining a single value in the code (param_mat
).
This was basically an approach to make the returned number of parameters dynamic. I am happy to look at other suggestions on how to solve this problem. But I think using a List could no be the most convenient option for this use case. What do you think?
you are right that "different distributions have a different number of parameters changing." so I think the param_mat should contain only those segment-specific real-valued model parameters (mean and/or variance) and everything else should be a separate arguments/variables/members with informative names and potentially different types (for example invalidates* should be int).
Right, the other "general" parameters could be separate vectors and then I can add them to the return list, along with the matrix that contains the variable parameter estimations for each distribution.
This is a good option, but it is not a "quick fix" since I have to change not only the C++ code, but also the R interface, mainly BinSegModel
and some BinSeg
class methods. Would you mind if I do this on a few weeks? Right now I am very busy at university since next week I have a couple of crucial deliveries and presentations. I will get to this as soon as I finish with that, is it okay?
sure, no rush. I'm just trying to help you make your code easier to understand/maintain using software engineering best practices.
Thanks @tdhock 😄
hi @diego-urgell in Algorithms.cpp I see
why are all of these things being stored in the same param_mat data structure? It results in lots of magic numbers (2, 3, 4) which are potentially confusing, especially some of the values are different types (for example mid is int but bestDecrease is double). To clarify this code, can these things be stored in separate members? then we could use names rather than numbers to reference them. (for example the name invalidatesIndex would be much easier to understand than the index 2)