blind-contours / CVtreeMLE

:deciduous_tree: :dart: Cross Validated Decision Trees with Targeted Maximum Likelihood Estimation
MIT License
5 stars 1 forks source link

Vignette #12

Open lightning-auriga opened 2 years ago

lightning-auriga commented 2 years ago

The vignette has two major issues:

1) the first half of the document is basically an entirely separate white paper that goes into some presentation of the stats methodology of the package. however, it's a bit messy (markdown formatting off, variables and functions not always clearly defined, etc.), and also it seems like a strange place to put a ton of exposition about the package. it's not really a proof either, so it's a bit of a strange fit. if it's a vignette, then it should really be integrated with example content, so then for example it might make it easier for the reader to understand, for example, exactly where in the package they can see a representation of rule coverage being computed. standing alone, I'm not sure it exactly has its intended effect.

2) in the second half, the worked examples, things are challenging in a different way. I think there is some desync between the example code and the text surrounding it. obviously, as mentioned in #9, the numbers change, but additionally there are some more straightforward mismatches: e.g. text says use 2fold but function call says n_folds=5. I also get a ton of warnings on my system when rendering the vignette, and that may just be something off about my setup, but without it prerendered on CRAN that's all I have to work with. Warning text was, for example, Warning in private$.train(processed_task, trained_sublearners): Lrnr_gam_NULL_NULL_GCV.Cp failed with message: Error in private$.train(processed_task): Specified outcome type is unsupported by Lrnr_gam. It will be removed from the stack for the section Run CVtreeMLE

I'll add additional specific comments to the issue.

lightning-auriga commented 2 years ago

Additional specific comments:

blind-contours commented 2 years ago

I have redone the entire vignette to match better practices. All the background has been moved to an Arxiv paper which I will publish once (hopefully) this package has been accepted with JOSS. All these issues should be resolved now. I still have some additions to make including better explanations of the marginal rules and plots. The vignette now focuses more on explaining the output and how it can be interpreted. Additionally the vignette uses synthetic data from the NIEHS that represents a real mixed exposure with known ground-truth. So we can simultaneously examine the output and compare interactions found to interactions built into the data.

blind-contours commented 2 years ago

I need to add some description of defining how exactly someone would figure out they need to increase CV folds which can be included still given changes to the vignette.