Closed g-weatherill closed 3 years ago
I have nothing against the idea but also I have no data about how many people are using the HTMK, nor a guess on how big the impact of a the change would be. Perhaps you could write an email on the mailing list to solicit user feedback.
Proposal for a re-factor to the hmtk:
The Catalogue object, which is central to most of the seismicity tools, is a customised objected in which the actual catalogue is stored in the attribute
data
, an organised dictionary in which each attribute of the catalogue (e.g. date, location, magnitude etc.) is stored in a separate list or array. This could now be handled much more elegantly by replacing this dictionary with a Pandas dataframe, and inheriting a lot of useful functions for sorting, indexing, basic stats etc. in the process. At the time the hmtk code was written (around 2011/2012) Pandas was not as mature and established as it is today, the hmtk wasn't integrated within the oq-engine, and OpenQuake itself was still extremely dependency heavy so packaging it with Pandas would have been problematic. Today, none of these factors are true anymore and Pandas is now a dependency of the main engine.The re-factor would require a substantial amount of work, however, so I leave this proposal open for feedback as to whether it is justified in terms of the number of users of the toolkit, the downstream implications for any other packages that use the affected hmtk features and the resources/time available (which may be limited).