eqcorrscan / EQcorrscan

Earthquake detection and analysis in Python.
https://eqcorrscan.readthedocs.io/en/latest/
Other
167 stars 86 forks source link

catalog_to_dd.py: input and output of dt.cc #516

Open erhmestel opened 2 years ago

erhmestel commented 2 years ago

What does this PR do?

Editing catalog_to_dd.py to allow input of existing correlation measurements and selection of output filename. Defaults keep everything the same as it currently is.

  1. existing_corr_file option added to write_correlations & compute_differential_times. Default None skips new sections of code.
  2. existing_corr_file read in with new function: _read_correlation_file returning a existing event pairs dictionary keyed to core event, with list of paired events.
  3. existing_pairs dictionary used to remove events paired with core event from sub_catalog so they are not recalculated
  4. existing_corr_file read in & output into output file, followed by newly calculated correlations.
  5. Added option of output_filename to write_correlations. Default "dt.cc"

Why was it initiated? Any relevant Issues?

To prevent having to recalculate correlations each time new events are added & to allow the output of different correlation files without overwriting.

Not been extensively tested. In small amount of testing code works as is, but could be more completely implemented. Things to do:

PR Checklist

calum-chamberlain commented 2 years ago

Thanks for this @erhmestel! Can you update your branch to track develop (press the "Update branch" button underneath all the test ticks, then pull the updated branch back to your local computer) please?

This looks good, even if stickler doesn't think so. I like using PyCharm for this kind of big project and I set up a ruler at 80 characters long (I forget how to do this, but this SO answer has it. PyCharm is pretty good at removing trailing whitespace for you and you can set up code linters (flake8 is the one run by stickler I think) which will highlight non pep8 friendly code.

I would go for cleaning that up, then work on writing a short test that tests this new behaviour. Once we have a test then we can play around with optimising. If there are any edge cases that you can think of that might result in odd behaviour (e.g. when no new events have been added, or if events are in the correlation file, but not in the catalog provided for the update, ...) then having tests for them would be helpful as well so that we can avoid code crashing as much as possible!

I should also use this as an opportunity to better document:

  1. Default action of computing correlation squared (and provide an option to output correlation);
  2. Default actions of travel time computation as tt1 - tt2.

If you have any suggestions of where those docs should go (e.g. where you would look for them) then let me know, otherwise we can put them in in all the relevant places.