Comments on the paper - Githubissues

above V-fold cross-validation line 24-25: My understanding about cross-validation is we partition the whole data into V folds, and each time use data in V-1 folds to build a model (parameter generating sample), and apply the built model on the fold that is left out for scoring/validation (estimation sample). Is this how the package works? If so I think the following sentence needs to be rephrased as it means after we partition the data into K folds, for each fold we continue to partition the data into two parts.

CVtreeMLE uses V-fold cross-validation and partitions the full data
in each fold into a parameter-generating sample and an estimation sample

Here fold means group or partition. This page has a good explanation of cross-validation https://machinelearningmastery.com/k-fold-cross-validation/

The background section line 58-59: As the author explained that in a lot of cases researchers are interested in studying a priori specified treatment or exposure, it is better to include a few typical such research examples/papers for readers to understand the context better.

line 61: more explanation is needed for what specifically high-dimensionality and sparsity refers to. It is the huge number of exposures compared to number of data points collected that makes the data high-dimensional? What is the sparsity about?

line 62-62: References are needed to support the following statement, so readers can understand why and how a target parameter is ill-defined, and therefore understand more about what this package wants to improve.

Even if this approach were possible, a target parameter that can inform public policy 
is still ill-defined

blind-contours / CVtreeMLE

Comments on the paper #6