forcing not written in directory specified by dir argument

RolfHut commented 2 months ago

When I run the makkink forcing and specify a directory that directory is used as the starting point for ESMValTool, but the final result is not written in that directory. This means that the result (including the yaml file with the forcing info) is written a few directories deeper. This means that

forcing.sources["LumpedMakkingForcing"].generate(dir = "myDir")

followed by

forcing = forcing.sources["LumpedMakkinkForcing"].load("myDir")

does not work, but should be

forcing = forcing.sources["LumpedMakkinkForcing"].load("myDir" + "work/diagnostic/script")

to make it work. This is not consistent between forcing sources.

BSchilperoort commented 2 months ago

I ran the MakkinkForcing and MarrmotForcing to see for myself, and indeed the directory you specify will be the work directory, but not the directory where eventually the resulting data is stored in.

This is not consistent between forcing sources.

I believe this is consistent between most forcing sources (all the ones that use the DefaultForcing's .generate method). I believe the only forcing module that deviates from this is CaravanForcing, as it does not use ESMValTool.

A workaround can be:

forcing = forcing.sources["LumpedMakkingForcing"].generate(dir="my_work_dir")
forcing_dir = forcing.directory
...
forcing = forcing.sources["LumpedMakkinkForcing"].load(forcing_dir)

But I agree that this behavior is not clearly described at the moment. Perhaps for users defining a "work directory" and an "output directory" separately might be better.

RolfHut commented 2 months ago

For my current use case the loading of the forcing is done in another notebook than the creation of the forcing, so that workaround would not work, unless that forcing_dir is stored somewhere as meta-data in a txt file or some such, but that would honestly defeat the point I think?

BSchilperoort commented 2 months ago

I guess as a workaround you can try to see if the "work/diagnostic/script" path exists.

from pathlib import Path

if Path("myDir" / "work/diagnostic/script").exists():
    forcing_dir = "myDir" + "/" + "work/diagnostic/script"
else:
    forcing_dir = "myDir"

Ideally we would change this in ewatercycle though, but that will be too late for you at the moment.

We can change the generate method in the following backwards compatible way:

class DefaultForcing:

    @classmethod
    def generate(
        ...
        directory: str | None = None,
        work_dir: str | None = None,
        output_dir: str | None = None,
    ):
        if directory is not None:
            warnings.warn(
                "directory kwargs is deprecated, please use work_dir and output_dir"
            )
            work_dir = directory

        ...
        if output_dir is not None:
            move(work_dir / "work/diagnostic/script")
            delete(work_dir)  # clean up temp files
        ...

This keeps the (deprecated) directory kwarg valid, but warns the user that they should not use it anymore. By splitting the work dir and output dir we can also do an (optional) clean up of the temporary files.

If a user only specifies the output_dir, the working directory will be the default one.

RolfHut commented 2 months ago

that would totally work and I can incorporate that into my workflow. Do you want to implement that?

BSchilperoort commented 2 months ago

I can tackle it at the same time as #419, but perhaps it's best if someone else also has a look at this before I work on it @Peter9192 @sverhoeven

Peter9192 commented 2 months ago

Few remarks

Perhaps output is not always in "work/diagnostic/script, e.g. when you only use preprocessors and no script
Not sure if I like two directories. We should not bother the user with thinking about work dirs. I guess the main reason we stick with the ESMValTool structure is to preserve the provenance info. But we could restructure it after ESMValTool finishes. Could we place the forcing files with the ewatercycle-forcing.yaml in the top level forcing_dir and any other files that we deem important (like provenance) in a subdir called metadata or so?

eWaterCycle / ewatercycle

forcing not written in directory specified by dir argument #423