brendanjmeade / celeri

Next generation earthquake cycle kinematics
BSD 3-Clause "New" or "Revised" License
25 stars 7 forks source link

Check path concatenation for pre-calculated partials #85

Closed jploveless closed 2 years ago

jploveless commented 2 years ago

@brendanjmeade Can you check https://github.com/brendanjmeade/celeri/commit/75a2ec7f0bf976f2f1757e733065eff1ca3fac4c to make sure what I've done is okay?

In celeri.read_data, I defined command.file_name so that I could reference it later: https://github.com/brendanjmeade/celeri/blob/75a2ec7f0bf976f2f1757e733065eff1ca3fac4c/celeri/celeri.py#L57

In celeri.get_elastic_operators, I find the path and then concatenate it to the .hdf file to read in saved partials. Otherwise I was getting a "file not found" error. https://github.com/brendanjmeade/celeri/blob/75a2ec7f0bf976f2f1757e733065eff1ca3fac4c/celeri/celeri.py#L1243

brendanjmeade commented 2 years ago

@jploveless What are your current thoughts on this?

We could go with absolute paths, which would be super simple. However, it would also mean that each user has to edit input files to run any model to make them

We could go with relative paths and it could work well if we document a clear standard for this. That would make sharing easier at the cost of having to adhere to a standard. Now that I write that, it sounds like a better idea than I anticipated because it would force us all to store files in the same way.

We could also support both via a flag in in the command.json file.

I was leaning toward absolute but now I'm thinking relative, like you suggested, is more appealing.

jploveless commented 2 years ago

I think for me the big issue is that the path to the pre-calculated operators is specified in command.json and it would therefore be clearest to specify a path relative to the location of command.json. I think what confused me before is that both the geometry files, which are in the same directory as command.json, and the elastic operator file were both specified with just ./ notation. My own thinking about a typical workflow is to keep all input files in a single directory (or a command/, segment/, block/, station/, mesh/, result type of structure, and it makes sense to me to keep the pre-calculated elastic operators tied to a particular block/station geometry. This is why I always stored it in a specific result directory in Blocks.

One approach could be to write a celeri_directories helper function that just sets up empty directories for each type of input file (like blocksdirs.m as well as a result/ directory into which the actual results as well as copies of that run's input files could be stored. And maybe we could set up celeri_ui to write into a structure like this by default?

brendanjmeade commented 2 years ago

I think for me the big issue is that the path to the pre-calculated operators is specified in command.json and it would therefore be clearest to specify a path relative to the location of command.json. I think what confused me before is that both the geometry files, which are in the same directory as command.json, and the elastic operator file were both specified with just ./ notation. My own thinking about a typical workflow is to keep all input files in a single directory (or a command/, segment/, block/, station/, mesh/, result type of structure, and it makes sense to me to keep the pre-calculated elastic operators tied to a particular block/station geometry. This is why I always stored it in a specific result directory in Blocks.

So something like this:

project_name
├── notebooks
│   ├── block_model.ipynb
│   ├── visualize_results.ipynb
│   └── resolution_tests.ipynb
├── command
│   ├── command_001.json
│   └── command_NNN.json
├── segment
│   ├── segment_001.csv
│   └── segment_NNN.csv
├── block
│   ├── block_001.csv
│   └── block_NNN.csv
├── station
│   ├── station_001.csv
│   └── station_NNN.csv
├── mesh
│   ├── mesh_001.msh
│   └── mesh_NNN.msh
├── precomputed_operators
│   ├── elastic_001.hdf5
│   └── elastic_NNN.hdf5
├── output
│   ├── 2022-02-20-17-01-39
│      ├── 2022-02-20-17-01-39.log
│      ├── model_segment.csv
│      ├── model_block.csv
│      ├── model_velocity.csv
│      ├── rotation_velocity.csv
│      ├── strain_rate_velocity.csv
│      ├── okada_velocity.csv
│      ├── tri_velocity.csv
│      └── elastic_velocity.csv
│   └── NNNN-NN-NN-NN-NN-NN
│      ├── NNNN-NN-NN-NN-NN-NN.log
│      ├── model_segment.csv
│      ├── model_block.csv
│      ├── model_velocity.csv
│      ├── rotation_velocity.csv
│      ├── strain_rate_velocity.csv
│      ├── okada_velocity.csv
│      ├── tri_velocity.csv
│      └── elastic_velocity.csv
└── README.md

This would be very organized for projects but a little different from what we have in the repository now. Maybe we could put command/, segment/, block/, station/, mesh/ in a data folder under project_name?

One approach could be to write a celeri_directories helper function that just sets up empty directories for each type of input file (like blocksdirs.m as well as a result/ directory into which the actual results as well as copies of that run's input files could be stored. And maybe we could set up celeri_ui to write into a structure like this by default?

This is an excellent idea! Let's agree on the folder structure first!

jploveless commented 2 years ago

I think this sounds great! I might suggest we store the precomputed operators within a specific output folder, or else have some way of attaching the corresponding segment, station, and mesh files to each .hdf5 file so that we could make sure we're working with the same model geometry (and make minor changes if need be, i.e. only re-calculate operators for the segments whose geometry have changed, like in Blocks).

brendanjmeade commented 2 years ago

Putting the precomputed operators on the output folders is a good idea! Here's the latest structure. Look good?

project_name
├── README.md
├── notebooks
│   ├── block_model.ipynb
│   ├── visualize_results.ipynb
│   └── resolution_tests.ipynb
├── command
│   ├── command_001.json
│   └── command_NNN.json
├── segment
│   ├── segment_001.csv
│   └── segment_NNN.csv
├── block
│   ├── block_001.csv
│   └── block_NNN.csv
├── station
│   ├── station_001.csv
│   └── station_NNN.csv
├── mesh
│   ├── mesh_001.msh
│   └── mesh_NNN.msh
└── output
    ├── 2022-02-20-17-01-39
    │  ├── 2022-02-20-17-01-39.log
    │  ├── elastic_operators.hdf5
    │  ├── model_segment.csv
    │  ├── model_block.csv
    │  ├── model_velocity.csv
    │  ├── rotation_velocity.csv
    │  ├── strain_rate_velocity.csv
    │  ├── okada_velocity.csv
    │  ├── tri_velocity.csv
    │  └── elastic_velocity.csv
    └── NNNN-NN-NN-NN-NN-NN
       ├── NNNN-NN-NN-NN-NN-NN.log
       ├── elastic_operators.hdf5
       ├── model_segment.csv
       ├── model_block.csv
       ├── model_velocity.csv
       ├── rotation_velocity.csv
       ├── strain_rate_velocity.csv
       ├── okada_velocity.csv
       ├── tri_velocity.csv
       └── elastic_velocity.csv
jploveless commented 2 years ago

Looks great! We'll need to consider the exact structure of where the mesh parameter file goes (in the mesh folder, or in the command folder? I think the latter) and therefore how it should reference the path of the actual mesh files. I'm sorry, I think I'm making this more complicated than it needs to be!

brendanjmeade commented 2 years ago

How about in the mesh folder? That would be pretty organized. Alternatively, what if we just added the mesh information to the command file?

jploveless commented 2 years ago

I think either approach would be fine. I'm not sure why I lean toward keeping it as a separate parameter file rather than embedded in the command file; just old habits, I think!

brendanjmeade commented 2 years ago

Maybe separate is better so that it would make it easier to swap out a bunch of meshes at once? Let's keep it separate. So the proposed folder structure is now:

project_name
├── README.md
├── notebooks
│   ├── block_model.ipynb
│   ├── visualize_results.ipynb
│   └── resolution_tests.ipynb
├── command
│   ├── command_001.json
│   └── command_NNN.json
├── segment
│   ├── segment_001.csv
│   └── segment_NNN.csv
├── block
│   ├── block_001.csv
│   └── block_NNN.csv
├── station
│   ├── station_001.csv
│   └── station_NNN.csv
├── mesh
│   ├── mesh_params_001.json
│   ├── mesh_params_NNN.json
│   ├── mesh_001.msh
│   └── mesh_NNN.msh
└── output
    ├── 2022-02-20-17-01-39
    │  ├── 2022-02-20-17-01-39.log
    │  ├── elastic_operators.hdf5
    │  ├── model_segment.csv
    │  ├── model_block.csv
    │  ├── model_velocity.csv
    │  ├── rotation_velocity.csv
    │  ├── strain_rate_velocity.csv
    │  ├── okada_velocity.csv
    │  ├── tri_velocity.csv
    │  └── elastic_velocity.csv
    └── NNNN-NN-NN-NN-NN-NN
       ├── NNNN-NN-NN-NN-NN-NN.log
       ├── elastic_operators.hdf5
       ├── model_segment.csv
       ├── model_block.csv
       ├── model_velocity.csv
       ├── rotation_velocity.csv
       ├── strain_rate_velocity.csv
       ├── okada_velocity.csv
       ├── tri_velocity.csv
       └── elastic_velocity.csv
jploveless commented 2 years ago

Looks great! Seems like this issue will have some overlap with #87 as well as UI development. I can try to write a directory setup function sometime soon.

brendanjmeade commented 2 years ago

I've completed this with : https://github.com/brendanjmeade/celeri/commit/36343b5f9a5e847e02aa11fbf9ff6b2c3917232a It does now require you to create your own "runs" folder in the root celeri folder to store model runs in. The .gitignore doesn't track anything in this folder.