Propose migrating the hmtk.seismicity.catalogue.Catalogue to a Pandas Dataframe based structure

Proposal for a re-factor to the hmtk:

The Catalogue object, which is central to most of the seismicity tools, is a customised objected in which the actual catalogue is stored in the attribute data, an organised dictionary in which each attribute of the catalogue (e.g. date, location, magnitude etc.) is stored in a separate list or array. This could now be handled much more elegantly by replacing this dictionary with a Pandas dataframe, and inheriting a lot of useful functions for sorting, indexing, basic stats etc. in the process. At the time the hmtk code was written (around 2011/2012) Pandas was not as mature and established as it is today, the hmtk wasn't integrated within the oq-engine, and OpenQuake itself was still extremely dependency heavy so packaging it with Pandas would have been problematic. Today, none of these factors are true anymore and Pandas is now a dependency of the main engine.

The re-factor would require a substantial amount of work, however, so I leave this proposal open for feedback as to whether it is justified in terms of the number of users of the toolkit, the downstream implications for any other packages that use the affected hmtk features and the resources/time available (which may be limited).

gem / oq-engine

Propose migrating the hmtk.seismicity.catalogue.Catalogue to a Pandas Dataframe based structure #5531