AgnostiqHQ / covalent-slurm-plugin

Executor plugin interfacing Covalent with Slurm
https://covalent.xyz
Apache License 2.0
27 stars 6 forks source link

Allow for the creation of unique subfolders in the `remote_workdir` #57

Closed Andrew-S-Rosen closed 1 year ago

Andrew-S-Rosen commented 1 year ago

Closes #56 and combines a separate bugfix I was going to introduce in #58.

### Added

- A new kwarg `create_unique_workdir` that will create unique subfolders of the type `<DISPATCH ID>/node_<NODE ID>` within `remote_workdir` if set to `True`

### Fixed

- Fixed a bug where `cleanup = False` would be ignored.
- Fixed a bug where if `cache_dir` was not present, Covalent would crash.
Andrew-S-Rosen commented 1 year ago

@wjcunningham7, @santoshkumarradha, @cjao: This should be good to go I think! I just tested it out, and it worked as expected.

Since we will want to mimic this approach in other executors, please make sure the naming and such are to your liking. When it's merged, can we also get a version update? Thanks!

codecov[bot] commented 1 year ago

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.36 :tada:

Comparison is base (ba35b6c) 84.06% compared to head (cf1da17) 84.43%.

:exclamation: Current head cf1da17 differs from pull request most recent head 81b687a. Consider uploading reports for the commit 81b687a to get more accurate results

Additional details and impacted files ```diff @@ Coverage Diff @@ ## develop #57 +/- ## =========================================== + Coverage 84.06% 84.43% +0.36% =========================================== Files 2 2 Lines 295 302 +7 =========================================== + Hits 248 255 +7 Misses 47 47 ``` | [Impacted Files](https://app.codecov.io/gh/AgnostiqHQ/covalent-slurm-plugin/pull/57?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=AgnostiqHQ) | Coverage Δ | | |---|---|---| | [covalent\_slurm\_plugin/slurm.py](https://app.codecov.io/gh/AgnostiqHQ/covalent-slurm-plugin/pull/57?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=AgnostiqHQ#diff-Y292YWxlbnRfc2x1cm1fcGx1Z2luL3NsdXJtLnB5) | `84.38% <100.00%> (+0.37%)` | :arrow_up: |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

santoshkumarradha commented 1 year ago

Wow this was super fast and perfect ! 🙏

CC : @wjcunningham7

Andrew-S-Rosen commented 1 year ago

EDIT: Fixed and ready to go!! I confirmed that everything works correctly in a "real" scenario on Perlmutter.

Alright, I'm a bit stumped. Entirely unrelated to this PR, I'm having that issue where I cannot unpickle things on the remote machine (Perlmutter). I tried nuking my entire conda env again on both local and remote but with no luck. That's making it a bit hard to test this and give me confidence that everything is working perfectly.

I also think I must have introduced an error somewhere when switching from self._current_remote_workdir to passing the currente_remote_workdir around because the unique folders aren't being made anymore when I tried it out "for real" (despite the tests passing here).

Going to have to tackle this tomorrow because I have no brain cells left right now...