Ergonomics - Githubissues

tbenthompson commented 2 years ago

While getting set up to debug the H-matrix issue, I ran into a few things that might or might not be bugs depending on preferences:

The runs/ folder didn't exist so I needed to create it. Is this something that we should just be creating if it doesn't exist?
The saved elastic operator didn't exist so I had to go into command/japan_command.json to turn off loading. Can we just fall back to creating the elastic operator if doesn't exist? Make a big warning message that says "The command file requested loading the elastic operators but the operators cannot be found at FILEPATH_HERE. Recomputing the elastic operators instead." I might even add a similar warning for situations where the elastic operators are reloaded just so users aren't surprised.
The operator saving errors because the data/operators folder doesn't exist. Can we create this automatically if it doesn't exist?
If I re-run the first cell that loads the command file, I get an error because the runs/2022-03-07-13-45-12/ already exists. This is because the RUN_NAME variable is being set globally on import. It would be nice to move it to be set when the command file is loaded. The current design means that you would not be able to run two separate block models in the same Python process. This is almost certainly going to be something you or others will want to do at some point and will be more painful to change the longer we wait to support it. It'll also be nice because I can re-run cells without restarting the notebook kernel.

Folders

As a more general point, it could be nice to split the input and output folders apart. Currently, the operators are being saved back into the input data/ folder. Maybe they could be saved into an output folder instead. I could imagine the structure could look something like:


output/
    command_name/
        saved_operators/
        run_2022-03-07-13-45-12/
        run_2022-03-07-13-46-38/

brendanjmeade commented 2 years ago

@tbenthompson Thanks for bringing this up. This important now and moving forward. Responses below:

The runs/ folder didn't exist so I needed to create it. Is this something that we should just be creating if it doesn't exist?

Yes, and will do.

The saved elastic operator didn't exist so I had to go into command/japan_command.json to turn off loading. Can we just fall back to creating the elastic operator if doesn't exist? Make a big warning message that says "The command file requested loading the elastic operators but the operators cannot be found at FILEPATH_HERE. Recomputing the elastic operators instead." I might even add a similar warning for situations where the elastic operators are reloaded just so users aren't surprised.

That's a really good idea and I'll do it!

The operator saving errors because the data/operators folder doesn't exist. Can we create this automatically if it doesn't exist?

Yes, and will do!

If I re-run the first cell that loads the command file, I get an error because the runs/2022-03-07-13-45-12/ already exists. This is because the RUN_NAME variable is being set globally on import. It would be nice to move it to be set when the command file is loaded. The current design means that you would not be able to run two separate block models in the same Python process. This is almost certainly going to be something you or others will want to do at some point and will be more painful to change the longer we wait to support it. It'll also be nice because I can re-run cells without restarting the notebook kernel.

I've run into this problem too and this deserves a fix too. Will do.

As a more general point, it could be nice to split the input and output folders apart. Currently, the operators are being saved back into the input data/ folder. Maybe they could be saved into an output folder instead. I could imagine the structure could look something like:

With the format we have now (see below) I think the input and output folders are separated. All output goes to a subfolder in runs and all input files live subfolders in data. What do you think?

project_name/
├── README.md
├── notebooks/
│   ├── block_model.ipynb
│   ├── visualize_results.ipynb
│   └── resolution_tests.ipynb
├── data/
|   ├── command/
│   |   ├── command_001.json
│   |   └── command_NNN.json
│   ├── segment/
│   │   ├── segment_001.csv
│   │   └── segment_NNN.csv
│   ├── block/
│   │   ├── block_001.csv
│   │   └── block_NNN.csv
│   ├── station/
│   │   ├── station_001.csv
│   │   └── station_NNN.csv
│   ├── mesh/
│   |   ├── mesh_params_001.json
│   |   ├── mesh_params_NNN.json
│   |   ├── mesh_001.msh
│   |   └── mesh_NNN.msh
|   └── operators/
│       ├── elastic_001.hdf5
│       └── elastic_NNN.hdf5
└── runs/
    ├── 2022-02-20-17-01-39/
    │  ├── 2022-02-20-17-01-39.log
    │  ├── elastic_operators.hdf5
    │  ├── model_segment.csv
    │  ├── model_block.csv
    │  └── model_station.csv
    └── NNNN-NN-NN-NN-NN-NN/
       ├── NNNN-NN-NN-NN-NN-NN.log
       ├── elastic_operators.hdf5
       ├── model_segment.csv
       ├── model_block.csv
       └── model_station.csv

tbenthompson commented 2 years ago

Awesome! Thanks for the positivity.

With the format we have now (see below) I think the input and output folders are separated. All output goes to a subfolder in runs and all input files live subfolders in data. What do you think?

That seems good. The thing that made me bring this up is the elastic operators which feel like "output". They are also input too though. Currently there's no concept of run-independent output which is why they're being stored in the data folder. I don't think it's a big deal but it could be nice to be able to definitively say that the input folder will not be modified by running the code.

operators/
│       ├── elastic_001.hdf5
│       └── elastic_NNN.hdf5

BTW, did you use a tool to make that directory tree so pretty?

jploveless commented 2 years ago

Ben, you're right that saved operators are both input and output. In Blocks, I would save a file called kernels.mat as a temporary file during the run, and then move it into the appropriate output folder once the run was complete. I suggest that we need to maintain some connection between the elastic operators and the geometry input files (stations, segments, meshes) so that we can assure that a new model's geometry is the same as that of the saved operators (or make minor adjustments as needed). So in Blocks, the operators were saved to the results directory, alongside copies of the geometry files. Then, in a subsequent run, those geometry files were read in, compared with the current geometry files, and then the operators could be used as is or slightly modified (say, calculate just the functions for a single segment whose dip had been changed and substitute those into the array). This meant that, when requesting use of the pre-calculated operators, the .command file referenced a specific result directory.

Brendan explained to me last week that the current directory structure for celeri works for development but eventually we'll want to switch to the directory tree listed in the readme. I suggested that we could even start a new repo entirely, something like celeri_projects.

tbenthompson commented 2 years ago

This meant that, when requesting use of the pre-calculated operators, the .command file referenced a specific result directory.

That makes a ton of sense. Thanks Jack!

brendanjmeade commented 2 years ago

Thanks @tbenthompson and @jploveless! I opened issues separately for all of these (https://github.com/brendanjmeade/celeri/issues/93, https://github.com/brendanjmeade/celeri/issues/94, https://github.com/brendanjmeade/celeri/issues/95, and https://github.com/brendanjmeade/celeri/issues/96) and got them all closed.

brendanjmeade / celeri

Ergonomics #89

Folders