LLNL / merlin

Machine Learning for HPC Workflows
MIT License
122 stars 26 forks source link

[BUG] MERLIN_SPEC variables are not working properly #423

Closed dylancliche closed 1 year ago

dylancliche commented 1 year ago

Bug Report

Describe the bug $(MERLIN_SPEC_ORIGINAL_TEMPLATE), $(MERLIN_SPEC_EXECUTED_RUN), and $(MERLIN_SPEC_ARCHIVED_COPY) do not rename the specification's filename with extension .orig.yaml, .partial.yaml, or .expanded.yaml respectively. Instead, it is using the name: given in the description: block within the specification file.

To Reproduce Steps to reproduce the behavior: In any workflow change the name in the description block to not be the filename of the specification file and type echo <any of the Merlin variables given above> in any of the workflow's steps.

Expected behavior In the output file for the step that contained the echo command, it should contain <spec filename>.<orig or partial or expanded>.yaml.

Please answer these questions to help us pinpoint the problem

Please run merlin info and paste the results here:

Merlin Configuration

config_file | /g/g15/cliche1/.merlin/app.yaml is_debug | False merlin_home | /g/g15/cliche1/.merlin merlin_home_exists | True broker server | amqps://cliche1:**@rzrabbit.llnl.gov:5671/cliche1 broker ssl | True results server | redis://mlsi:**@rzrabbit.llnl.gov:6379/0 results ssl | False

Checking server connections:

broker server connection: OK results server connection: OK

dylancliche commented 1 year ago

@bgunnar5, @lucpeterson, @koning: Any thoughts?

bgunnar5 commented 1 year ago

Sorry for the delayed reply, I've been on vacation since last Wednesday. I just had a chance to look into this and I believe this could be changed in the merlin/study/study.py file in the init method of the MerlinStudy class. Right now we have something that looks like:

def __init__(  # pylint: disable=R0913
        self,
        filepath,
        override_vars=None,
        restart_dir=None,
        samples_file=None,
        dry_run=False,
        no_errors=False,
        pgen_file=None,
        pargs=None,
    ):
        self.original_spec = MerlinSpec.load_specification(filepath)

        self.special_vars = {
            "MERLIN_SPEC_ORIGINAL_TEMPLATE": os.path.join(
                self.info,
                self.original_spec.description["name"].replace(" ", "_") + ".orig.yaml",
            ),
            "MERLIN_SPEC_EXECUTED_RUN": os.path.join(
                self.info,
                self.original_spec.description["name"].replace(" ", "_") + ".partial.yaml",
            ),
            "MERLIN_SPEC_ARCHIVED_COPY": os.path.join(
                self.info,
                self.original_spec.description["name"].replace(" ", "_") + ".expanded.yaml",
            ),
        }

Since we pass in the filepath already, we could add a new special variable here called "ORIGINAL_FILEPATH" or something similar like so:

self.special_vars = {
            "MERLIN_SPEC_ORIGINAL_TEMPLATE": os.path.join(
                self.info,
                self.original_spec.description["name"].replace(" ", "_") + ".orig.yaml",
            ),
            "MERLIN_SPEC_EXECUTED_RUN": os.path.join(
                self.info,
                self.original_spec.description["name"].replace(" ", "_") + ".partial.yaml",
            ),
            "MERLIN_SPEC_ARCHIVED_COPY": os.path.join(
                self.info,
                self.original_spec.description["name"].replace(" ", "_") + ".expanded.yaml",
            ),
            "ORIGINAL_FILEPATH": filepath,
        }
bgunnar5 commented 1 year ago

This filepath wouldn't have any of the variables expanded, it would just strictly point to the absolute path (I think it'd be the absolute path at least) of the file that was provided by the user

koning commented 1 year ago

I can see how that might present as a bug, but that is not the choice they made when implementing this feature. I think Dylan is asking for this:

base_name = os.path.basename(filepath)
base_name = base_name.replace(".yaml", "")
self.special_vars = {
            "MERLIN_SPEC_ORIGINAL_TEMPLATE": os.path.join(
                self.info,
                base_name + ".orig.yaml",
            ),
            "MERLIN_SPEC_EXECUTED_RUN": os.path.join(
                self.info,
                base_name + ".partial.yaml",
            ),
            "MERLIN_SPEC_ARCHIVED_COPY": os.path.join(
                self.info,
                base_name + ".expanded.yaml",
            ),
        }

But it will need to be changed in the code, and the basename may not be available: https://github.com/LLNL/merlin/blob/c8c6a44100ac73bfa06d0dd40a80ab66693a60dd/merlin/study/study.py#L371

dylancliche commented 1 year ago

@bgunnar5: No problem. It doesn't even have to be the full filepath (just as @koning pointed out). Especially since $(SPECROOT) already shows the path. It would be nice to have the filename: something along the lines of $(SPECFILE) to be analogous to $(SPECROOT).

@koning: I presented it as a bug because I originally presented it as a new feature, but then @lucpeterson stated it may be a bug so I decided to report it here as well. I wasn't sure where the scripts for the MERLIN_SPEC variables were located, but having @bgunnar5 post them I can see that the original choice for implementing this feature was not what I was looking for. If that is the case, instead of changing the MERLIN_SPEC variables as you suggested (unless that's what everyone thinks is best) we can definitely go the other route and just add a new variable such as SPECFILE to get what I am looking for.

lucpeterson commented 1 year ago

I think it’s a bug because you could have multiple yaml files with different base names but the same name in the yaml file. The new file would have that as its name. It’d have the correct contents but you wouldn’t know which file it came from

dylancliche commented 1 year ago

As a piggy back. That was the main reason to why I originally proposed the new feature (and why Luc thinks it is a bug) is because, as Luc mentioned above, different specification files can have the same name in the description block and therefore in the merlin_info directory you wouldn't know which workflow the .orig, .partial, and .expanded files came from without investigating the contents. I'd rather just get the filename.

dylancliche commented 1 year ago

@bgunnar5: Here's an idea (no matter if we make a new variable or change the MERLIN_SPEC variables) based on @koning's idea. What if we add spec.specfile = os.path.basename(filepath) in the load_specification class method within the merlin/spec/specification.py file. Then either make a special variable in merlin/study/study.py that sets it equal to self.original_spec.specfile (if we want a different variable), or use it to define the MERLIN_SPEC variables such as @koning's suggestion.

koning commented 1 year ago

These files are only for one specific study, so they are unique within a study even though they might not be unique across multiple studies. The study directory is unique.

bgunnar5 commented 1 year ago

Even with the files being unique to a study I do think it makes more sense for these files to be named based on the original filename rather than the name of the study itself

koning commented 1 year ago

Yes I agree, I'm just clarifying the bug versus feature discussion.