Purge existing MY file before MY collect step

ppinchuk commented 1 year ago

@grantbuster Can you let me know if the warning text makes sense? If not, what would be a good way to rephrase it?

ppinchuk commented 1 year ago

It's definitely a surprising result - one which caused quite a bit of confusion this past week.

I think the reason we don't overwrite pass-through datasets is to avoid repeatedly writing the same (necessarily identical) pass-through dataset from each single-year file during processing. In other words, we indiscriminately call the _copy_dset function on every dataset of every single-year generation file, and expect it not to do any extra work on those identical pass-through datasets.

Now that I think about it, this may actually cause problems with the dsets inputs as well if re-running the multi-year step with an existing multi-year gen file. But perhaps the simplest solution to this problem is to ask the user to delete the file before re-running?

Curious to hear your thoughts @grantbuster

ppinchuk commented 1 year ago

I guess that or automatically purge any existing multi-year file at the beggining of processing. But I wonder if purging user data would open it's own can of worms...

codecov-commenter commented 1 year ago

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.05% :tada:

Comparison is base (10e39c5) 86.98% compared to head (0cdc9a9) 87.04%.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #425 +/- ## ========================================== + Coverage 86.98% 87.04% +0.05% ========================================== Files 122 122 Lines 16977 17001 +24 ========================================== + Hits 14768 14798 +30 + Misses 2209 2203 -6 ``` | [Flag](https://app.codecov.io/gh/NREL/reV/pull/425/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NREL) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/NREL/reV/pull/425/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NREL) | `87.04% <100.00%> (+0.05%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NREL#carryforward-flags-in-the-pull-request-comment) to find out more. | [Files Changed](https://app.codecov.io/gh/NREL/reV/pull/425?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NREL) | Coverage Δ | | |---|---|---| | [reV/handlers/multi\_year.py](https://app.codecov.io/gh/NREL/reV/pull/425?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NREL#diff-cmVWL2hhbmRsZXJzL211bHRpX3llYXIucHk=) | `84.03% <100.00%> (+0.30%)` | :arrow_up: | | [tests/test\_handlers\_multiyear.py](https://app.codecov.io/gh/NREL/reV/pull/425?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NREL#diff-dGVzdHMvdGVzdF9oYW5kbGVyc19tdWx0aXllYXIucHk=) | `97.93% <100.00%> (+0.08%)` | :arrow_up: | ... and [13 files with indirect coverage changes](https://app.codecov.io/gh/NREL/reV/pull/425/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NREL)

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

grantbuster commented 1 year ago

I guess that or automatically purge any existing multi-year file at the beggining of processing. But I wonder if purging user data would open it's own can of worms...

My preference would be to purge any target output file if it already exists... imagine if jobs failed while halfway done collecting data. You would get garbage results without warning because the dataset would have initialized but not finished writing. Probably a dramatic and rare example but possible. For collect and multi-year i think this makes sense. For Gen i think we probably overwrite datasets regardless because we init first then write later?

ppinchuk commented 1 year ago

Yes, I tend to agree. Generation just overwrite the files completely, so purging the MY file before processing makes things more consistent. I've updated the code accordingly

grantbuster commented 1 year ago

Yes, I tend to agree. Generation just overwrite the files completely, so purging the MY file before processing makes things more consistent. I've updated the code accordingly

Should we check collect while we're at it?

ppinchuk commented 1 year ago

The collect logic is handled in GAPs, so we'd have to make changes there. Looks like you guys already thought about this and added a "clobber" option. However, looks like this was set to False by default and not exposed to the user for some reason. I'll go ahead and expose it to the CLI, and update the default value to True, thereby giving the user full control while also maintaining reasonable default behavior

ppinchuk commented 1 year ago

I guess we can add a "clobber" option with a default True value to MY collect for consistency. What do you think?

grantbuster commented 1 year ago

I guess we can add a "clobber" option with a default True value to MY collect for consistency. What do you think?

I'm indifferent. the MY module is really meant for scalar means and should run instantly so clobber should always be True. But desire for uniformity thinks having a clobber option would be nice.

ppinchuk commented 1 year ago

If you really are indifferent, I'll probably go ahead and add it. A default True option will purge files (without any extra user intervention) in all cases we can think of at the moment where files should be clobbered. But as soon as an exception to this rule inevitably shows up, the power will already be in the users hands :)

grantbuster commented 1 year ago

If you really are indifferent, I'll probably go ahead and add it. A default True option will purge files (without any extra user intervention) in all cases we can think of at the moment where files should be clobbered. But as soon as an exception to this rule inevitably shows up, the power will already be in the users hands :)

Yep sounds good to me; I am truly indifferent.

ppinchuk commented 1 year ago

Sounds good. Added that in

ppinchuk commented 1 year ago

@grantbuster any other requests or am I good to merge?

NREL / reV

Purge existing MY file before MY collect step #425

Codecov Report