Open valeriupredoi opened 1 year ago
This is the intended behaviour. Dataset.load
runs all preprocessor steps required to load the data (download, fixes, cmor check, concatenate, clip timerange, add supplementary variables) and respects the settings in its session attribute. If the users of the Dataset
class does not set the session
attribute, an esmvalcore.config.Session
is automatically started and used.
thanks @bouweandela - I understand that, and it's a gud boi Dataset
, all I need to see is something that prints to screen "hello, I am going to fill up your local disk with junk bc you are silly and have forgotten to unset save_intermediary_cubes
in CFG" :grin: I'll open a PR for that :+1:
To my understanding overloading
Dataset
outside its specs (ie to be used inside a preprocessing function, and directly calling its methods) should not write anything unless specifically asked to via asave
method. Instead, it actually does save to disk a lot of data ifsave_intermediary_cubes
is set to True. This process (identical to the actual workflow process) is slow, memory-intesive, and eating up disk: I hadsave_intermediary_cubes
set to True in my user config, as a result the loading of aDataset
object leads to the unwanted creation of anesmvaltool_output
dir, with a session hash, where intermediary files are stored:First off - I don't like this, am fairly sure users are not aware of such a behaviour - this is under the hood behaviour that may lead to problems (it is undocumented AFAIK, and such data output doesn't happen if one doesn't turn on intermediary saves), and second - if this is an intended behaviour, we should document it :beer: First discovered in https://github.com/ESMValGroup/ESMValCore/issues/2162