gem / oq-engine

OpenQuake Engine: a software for Seismic Hazard and Risk Analysis
https://github.com/gem/oq-engine/#openquake-engine
GNU Affero General Public License v3.0
383 stars 277 forks source link

Propose migrating the hmtk.seismicity.catalogue.Catalogue to a Pandas Dataframe based structure #5531

Closed g-weatherill closed 3 years ago

g-weatherill commented 4 years ago

Proposal for a re-factor to the hmtk:

The Catalogue object, which is central to most of the seismicity tools, is a customised objected in which the actual catalogue is stored in the attribute data, an organised dictionary in which each attribute of the catalogue (e.g. date, location, magnitude etc.) is stored in a separate list or array. This could now be handled much more elegantly by replacing this dictionary with a Pandas dataframe, and inheriting a lot of useful functions for sorting, indexing, basic stats etc. in the process. At the time the hmtk code was written (around 2011/2012) Pandas was not as mature and established as it is today, the hmtk wasn't integrated within the oq-engine, and OpenQuake itself was still extremely dependency heavy so packaging it with Pandas would have been problematic. Today, none of these factors are true anymore and Pandas is now a dependency of the main engine.

The re-factor would require a substantial amount of work, however, so I leave this proposal open for feedback as to whether it is justified in terms of the number of users of the toolkit, the downstream implications for any other packages that use the affected hmtk features and the resources/time available (which may be limited).

micheles commented 4 years ago

I have nothing against the idea but also I have no data about how many people are using the HTMK, nor a guess on how big the impact of a the change would be. Perhaps you could write an email on the mailing list to solicit user feedback.