Closed cwatt closed 3 years ago
Hi @cwatt, thank you for your suggestion!
I think we should be teaching best practices as much as possible. In my opinion, saving the workspace -- as convenient as it may be -- isn't the best solution to the problem you are describing and, as the text points out, can cause some hard-to-debug errors.
There are better (i.e., more explicit, less error-prone, more reproducible) solutions to store results of long-running, expensive computations. One solution would be to store and retrieve intermediate results yourself using, for example, write_rds()
and read_rds()
, another one would be to use a package like {targets}
to keep track of intermediate results for you.
Anyways, perhaps we can meet halfway, and say "it's usually a good idea to turn off saving of the workspace"?
Current:
By default, all of these objects will be saved, and automatically loaded, when you reopen your project. Saving a workspace to .RData can be cumbersome, especially if you are working with larger datasets, and it can lead to hard to debug errors by having objects in memory you forgot you had. To turn that off, go to Tools –> ‘Global Options’ and select the ‘Never’ option for ‘Save workspace to .RData’ on exit.’
Suggested update:
By default, all of these objects will be saved, and automatically loaded, when you reopen your project. Saving a workspace to .RData can be cumbersome, especially if you are working with larger datasets, and it can lead to hard to debug errors by having objects in memory you forgot you had. Therefore, it's often a good idea to turn this off. To do so, go to Tools –> ‘Global Options’ and select the ‘Never’ option for ‘Save workspace to .RData’ on exit.’
Hi maintainers,
In the Before We Start, Getting Set Up section, I agree that we should recommend that learners turn off the automatic save workspace preference. But should we also give a brief disclaimer explaining that there are some cases where saving the workspace or critical intermediate files makes sense, ie having to pause work in the midst of analyzing a dataset with long computing times and/or many intermediate steps? At least, in my line of work this is the case fairly frequently, and it would be very frustrating to have to start over from the beginning each time I work on the analysis!
Thanks for the great lessons and all your hard work! Cassi