cjvanlissa / worcs

Rstudio project template and convenience functions for the Workflow for Open Reproducible Code in Science (WORCS)
https://cjvanlissa.github.io/worcs/
GNU General Public License v3.0
76 stars 11 forks source link

Make it possible to manually add a synthetic dataset #94

Closed cjvanlissa closed 3 years ago

cjvanlissa commented 3 years ago

Should it be possible to call closed_data() without synthesizing data? It should definitely be possible to manually add a synthetic dataset.

aaronpeikert commented 3 years ago

I imagine it could be helpful to call closed_data() without synthesizing if-then load_data() would issue a message about how to obtain the real data. Or maybe we build a new function that does that (called private_data() or requestable_data())?

cjvanlissa commented 3 years ago

I agree. I ran into the following issues recently: 1) A lavaan model did not converge on the synthetic data, thus rendering the remainder of my code non-reproducible. The problem was solved by simulating data from the lavaan model directly - but there was no function to replace the synthetic_data.csv 2) Data were too large to be synthesized using random forests. Simple bootstrapping per column would work, and synthetic() allows for this - but closed_data() does not allow users to modify the call to synthetic()

aaronpeikert commented 3 years ago

Ah I didn't read carefully enough! Couldn't you pass the dots from closed_data to save_data and from there to synthetic()?

aaronpeikert commented 3 years ago

I would do this for you, but I do not understand the magic of all_args()

cjvanlissa commented 3 years ago

@aaronpeikert I added all_args() to avoid problems with list(match.call()[-1]) where, for example, in tests, functions would fail because the passed object were within some higher parent environment and not within scope for the final function down the call stack.