NREL / disco

DISCO
BSD 3-Clause "New" or "Revised" License
10 stars 5 forks source link

Change default transform directory name #37

Closed daniel-thom closed 3 years ago

daniel-thom commented 3 years ago

We identified a confusing behavior today. Suppose you do this:

disco transform-model <some-source-path> snapshot --hierarchy=feeder
mv snapshot-models feeder-snapshot-models
disco transform-model <some-source-path> snapshot --hierarchy=substation
disco config snapshot feeder-snapshot-models -c feeder-config.json
jade submit-jobs feeder-config.json

The jobs will use source data at the substation level. This is because the transform-model command generates files in the target directory that use the target directory name. They do not contain relative paths. So, if you rename that target directory, you will be in a broken state. If the original directory name still happens to exist, you will be in an even more broken state.

The best way to fix this would be to make all paths in the target directory relative. However, we did it this way to make the code simple (this was more than a year ago). OpenDSS changes its internal directory whenever it compiles a file. So, you can end up a in a big mess if you have relative paths everywhere. It certainly would be possible to do something different, but I'd prefer not to at this time.

To mitigate this problem I changed the default target directory to include the hierarchy (substation vs feeder), so users will be less inclined to change the name.

Your feedback is welcome. Perhaps you would strongly prefer that the target directory contain relative paths.

I also changed the default snapshot time to something more sensical.

bpalmintier commented 3 years ago

I'm a bit confused, does this mean that everything then requires a full absolute path or simply that things are fully specified below a given root folder?

It seems that having everything fully absolute is its own nightmare since it require reworking everything when changing directories or platforms. But if there was a common root defined, then fully specified below the root would make it reasonably easy to be portable.

daniel-thom commented 3 years ago

I'm a bit confused, does this mean that everything then requires a full absolute path or simply that things are fully specified below a given root folder?

It seems that having everything fully absolute is its own nightmare since it require reworking everything when changing directories or platforms. But if there was a common root defined, then fully specified below the root would make it reasonably easy to be portable.

I'll provide a concrete example for clarity.

  1. Transform SMART-DS data into a local workspace containing disco models with

    $ disco transform-model <some-source-path> snapshot --hierarchy=feeder --output=snapshot-feeder-models
    $ head -1 snapshot-feeder-models/p1uhs21_1247/p1udt5257/PVDeployments/p1uhs21_1247__p1udt5257__close__1__5.dss
    Redirect snapshot-feeder-models/p1uhs21_1247/p1udt5257/OpenDSS/Master.dss

    Note: Load shape data files are not copied as part of this because that would be very expensive. We replace the relative paths with absolute paths.

  2. Run the jobs through PyDSS. The .dss file generated at runtime that serves as the OpenDSS entry point has an absolute path to the local workspace file created in step 1.

    $ head -1 output/job-outputs/p1uhs21_1247__p1udt21301__close__1__5/pydss_project/DSSfiles/deployment.dss
    Redirect /data/snapshot-feeder-models/p1uhs21_1247/p1udt21301/OpenDSS/Master.dss

Consequences:

  1. If you copied snapshot-feeder-models to another system, didn't rename it, and it did not define load shapes, it would work.
  2. If (1) but there are load shapes, it would fail.
  3. If (1) but you renamed the directory, it would fail.

We could change the logic such that the local workspace directory gets passed along to the right place in code and then fix the runtime-generated path later. Then it would be more flexible. But then there is still a portability issue with load shapes. That could also be changed if needed.

bpalmintier commented 3 years ago

Hmm, might it work to have a few root directories? E.g.: for Load shapes, models, results (might not be the right list). I'm always leery of absolute directories since it can open up issues and bandaid solutions. But it could be that the very nature of the larger loadshape files makes even the multiple root paths tricky.

Maybe it helps to think through a few use cases:

  1. A general one would be for moving a full set of a larger DISCO job to a different location, such as trying to build off of work from DANCE on a different project, or perhaps if a single user is trying to share work from their home folder with others.
  2. As part of debugging a larger set of jobs it would be helpful to be able to select a particularly troubling one and take it off-line to a local pyDSS or even OpenDSS instance for debugging
  3. Likely others

I get the sense that for #1, the current infrastructure could work, although I understand it is really designed to build again from scratch, so there might be a gotcha if there were any adjustements made, but that is arguably an unsupported use case.

2 seems tricker but perhaps most useful. It makes me wonder if the solution might be to spin out a new functionality that makes it possible to repackage a portion of a DISCO job set such that all of the required data is in a single folder and the paths are adapted to work in a more relative way. The primary idea would be for such debugging use cases, but such a capability could also be used to create more easily shared versions of small to medium job sets such as to send to industry partners.

Thoughts?


From: Daniel Thom @.> Sent: Wednesday, July 28, 2021 3:45 PM To: NREL/disco @.> Cc: Bryan Palmintier @.>; Comment @.> Subject: Re: [NREL/disco] Change default transform directory name (#37)

CAUTION: This email originated from outside of NREL. Do not click links or open attachments unless you recognize the sender and know the content is safe.

I'm a bit confused, does this mean that everything then requires a full absolute path or simply that things are fully specified below a given root folder?

It seems that having everything fully absolute is its own nightmare since it require reworking everything when changing directories or platforms. But if there was a common root defined, then fully specified below the root would make it reasonably easy to be portable.

I'll provide a concrete example for clarity.

  1. Transform SMART-DS data into a local workspace containing disco models with

$ disco transform-model snapshot --hierarchy=feeder --output=snapshot-feeder-models $ head -1 snapshot-feeder-models/p1uhs21_1247/p1udt5257/PVDeployments/p1uhs21_1247p1udt5257close15.dss Redirect snapshot-feeder-models/p1uhs21_1247/p1udt5257/OpenDSS/Master.dss

Note: Load shape data files are not copied as part of this because that would be very expensive. We replace the relative paths with absolute paths.

  1. Run the jobs through PyDSS. The .dss file generated at runtime that serves as the OpenDSS entry point has an absolute path to the local workspace file created in step 1.

$ head -1 output/job-outputs/p1uhs21_1247p1udt21301close15/pydss_project/DSSfiles/deployment.dss Redirect /data/snapshot-feeder-models/p1uhs21_1247/p1udt21301/OpenDSS/Master.dss

Consequences:

  1. If you copied snapshot-feeder-models to another system, didn't rename it, and it did not define load shapes, it would work.
  2. If (1) but there are load shapes, it would fail.
  3. If (1) but you renamed the directory, it would fail.

We could change the logic such that the local workspace directory gets passed along to the right place in code and then fix the runtime-generated path later. Then it would be more flexible. But then there is still a portability issue with load shapes. That could also be changed if needed.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FNREL%2Fdisco%2Fpull%2F37%23issuecomment-888642352&data=04%7C01%7Cbryan.palmintier%40nrel.gov%7C360b17bbb7e040571a8b08d952110ed4%7Ca0f29d7e28cd4f5484427885aee7c080%7C0%7C0%7C637631056981286435%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=ZLdn7YZgm5dkb4zcltjQ1mp7P4O%2FEGUQ8XfJ0NxfRDI%3D&reserved=0, or unsubscribehttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAB43OFACLYIVTHLY2Q3H5TT2B3ANANCNFSM5BFDXV6A&data=04%7C01%7Cbryan.palmintier%40nrel.gov%7C360b17bbb7e040571a8b08d952110ed4%7Ca0f29d7e28cd4f5484427885aee7c080%7C0%7C0%7C637631056981286435%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=ieDLVfbHXTmlEa3bl2lnIfhzwB6f%2BDpyOVUpdplV8a0%3D&reserved=0.

bpalmintier commented 3 years ago

Clarification suggested idea for #2 might involve pulling all the the required data into a single local directory STRUCTURE, when prepping for export. Likely this wouldn't be a single directory.


From: Palmintier, Bryan @.> Sent: Wednesday, July 28, 2021 4:45 PM To: NREL/disco @.>; NREL/disco @.> Cc: Bryan Palmintier @.>; Comment @.***> Subject: Re: [NREL/disco] Change default transform directory name (#37)

Hmm, might it work to have a few root directories? E.g.: for Load shapes, models, results (might not be the right list). I'm always leery of absolute directories since it can open up issues and bandaid solutions. But it could be that the very nature of the larger loadshape files makes even the multiple root paths tricky.

Maybe it helps to think through a few use cases:

  1. A general one would be for moving a full set of a larger DISCO job to a different location, such as trying to build off of work from DANCE on a different project, or perhaps if a single user is trying to share work from their home folder with others.
  2. As part of debugging a larger set of jobs it would be helpful to be able to select a particularly troubling one and take it off-line to a local pyDSS or even OpenDSS instance for debugging
  3. Likely others

I get the sense that for #1, the current infrastructure could work, although I understand it is really designed to build again from scratch, so there might be a gotcha if there were any adjustements made, but that is arguably an unsupported use case.

2 seems tricker but perhaps most useful. It makes me wonder if the solution might be to spin out a new functionality that makes it possible to repackage a portion of a DISCO job set such that all of the required data is in a single folder and the paths are adapted to work in a more relative way. The primary idea would be for such debugging use cases, but such a capability could also be used to create more easily shared versions of small to medium job sets such as to send to industry partners.

Thoughts?


From: Daniel Thom @.> Sent: Wednesday, July 28, 2021 3:45 PM To: NREL/disco @.> Cc: Bryan Palmintier @.>; Comment @.> Subject: Re: [NREL/disco] Change default transform directory name (#37)

CAUTION: This email originated from outside of NREL. Do not click links or open attachments unless you recognize the sender and know the content is safe.

I'm a bit confused, does this mean that everything then requires a full absolute path or simply that things are fully specified below a given root folder?

It seems that having everything fully absolute is its own nightmare since it require reworking everything when changing directories or platforms. But if there was a common root defined, then fully specified below the root would make it reasonably easy to be portable.

I'll provide a concrete example for clarity.

  1. Transform SMART-DS data into a local workspace containing disco models with

$ disco transform-model snapshot --hierarchy=feeder --output=snapshot-feeder-models $ head -1 snapshot-feeder-models/p1uhs21_1247/p1udt5257/PVDeployments/p1uhs21_1247p1udt5257close15.dss Redirect snapshot-feeder-models/p1uhs21_1247/p1udt5257/OpenDSS/Master.dss

Note: Load shape data files are not copied as part of this because that would be very expensive. We replace the relative paths with absolute paths.

  1. Run the jobs through PyDSS. The .dss file generated at runtime that serves as the OpenDSS entry point has an absolute path to the local workspace file created in step 1.

$ head -1 output/job-outputs/p1uhs21_1247p1udt21301close15/pydss_project/DSSfiles/deployment.dss Redirect /data/snapshot-feeder-models/p1uhs21_1247/p1udt21301/OpenDSS/Master.dss

Consequences:

  1. If you copied snapshot-feeder-models to another system, didn't rename it, and it did not define load shapes, it would work.
  2. If (1) but there are load shapes, it would fail.
  3. If (1) but you renamed the directory, it would fail.

We could change the logic such that the local workspace directory gets passed along to the right place in code and then fix the runtime-generated path later. Then it would be more flexible. But then there is still a portability issue with load shapes. That could also be changed if needed.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FNREL%2Fdisco%2Fpull%2F37%23issuecomment-888642352&data=04%7C01%7Cbryan.palmintier%40nrel.gov%7C360b17bbb7e040571a8b08d952110ed4%7Ca0f29d7e28cd4f5484427885aee7c080%7C0%7C0%7C637631056981286435%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=ZLdn7YZgm5dkb4zcltjQ1mp7P4O%2FEGUQ8XfJ0NxfRDI%3D&reserved=0, or unsubscribehttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAB43OFACLYIVTHLY2Q3H5TT2B3ANANCNFSM5BFDXV6A&data=04%7C01%7Cbryan.palmintier%40nrel.gov%7C360b17bbb7e040571a8b08d952110ed4%7Ca0f29d7e28cd4f5484427885aee7c080%7C0%7C0%7C637631056981286435%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=ieDLVfbHXTmlEa3bl2lnIfhzwB6f%2BDpyOVUpdplV8a0%3D&reserved=0.

daniel-thom commented 3 years ago

It wouldn't be too difficult to add an option that copies all load shape data into the local workspace. There has not been a need so far. We definitely don't want to do this by default because of the data sizes and slowness.

In general the best place to debug a problem is on Eagle (in place). It's very straightforward. You can change model definitions in the local workspace and run PyDSS or OpenDSSDirect manually.

I did have to export one complete job directory to another system once and it was a pain. To make it completely optimal you actually have to parse all .dss files that define load shapes, identify the specific data files in use, and then only copy those. The SMART-DS datasets contain all profile files for a given region in one directory, which can be up to 4 GiB in size.

If the team needs this feature then I would add the option to the transform step to disable the load-shape shortcut.

daniel-thom commented 3 years ago

@bpalmintier I am merging this PR so that the team can benefit from the improved functionality. I created https://agile.nrel.gov/browse/DISCO-308 to track the improvements we discussed here.

bpalmintier commented 3 years ago

Sounds good. I suspect the ability to encapsulate to run as a stand-alone PyDSS model and also as a standalone OpenDSS model (with reduced control functionality but to check powerflow, etc.) will be a very helpful capability off of Eagle. The ability to work on a problematic feeder model in an interactive graphical OpenDSS session is very powerful. But agree that belongs as a separate feature request


From: Daniel Thom @.> Sent: Thursday, July 29, 2021 5:33 PM To: NREL/disco @.> Cc: Bryan Palmintier @.>; Mention @.> Subject: Re: [NREL/disco] Change default transform directory name (#37)

CAUTION: This email originated from outside of NREL. Do not click links or open attachments unless you recognize the sender and know the content is safe.

Merged #37https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FNREL%2Fdisco%2Fpull%2F37&data=04%7C01%7Cbryan.palmintier%40nrel.gov%7Ca1a2a6ea4822441d245808d952e945c4%7Ca0f29d7e28cd4f5484427885aee7c080%7C0%7C0%7C637631984128325856%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=waWCXvhkBp6ewk57VYJYm7T%2BrWSr51TuphVcOW8XN%2Fw%3D&reserved=0 into master.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FNREL%2Fdisco%2Fpull%2F37%23event-5088747704&data=04%7C01%7Cbryan.palmintier%40nrel.gov%7Ca1a2a6ea4822441d245808d952e945c4%7Ca0f29d7e28cd4f5484427885aee7c080%7C0%7C0%7C637631984128335810%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=RNrV%2B1dXgL4pTwY6mW%2BGDfeItMa%2BtlXXyyu%2FUr2PeXM%3D&reserved=0, or unsubscribehttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAB43OHQ4DLXYUJUQLZRYATT2HQMPANCNFSM5BFDXV6A&data=04%7C01%7Cbryan.palmintier%40nrel.gov%7Ca1a2a6ea4822441d245808d952e945c4%7Ca0f29d7e28cd4f5484427885aee7c080%7C0%7C0%7C637631984128335810%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=WFS97rpGQ%2BJzLiKWiQZJjffOfAK9qtvBKJICqEgaYJs%3D&reserved=0.