epic-open-source / seismometer

AI model evaluation with a focus on healthcare
https://epic-open-source.github.io/seismometer/
BSD 3-Clause "New" or "Revised" License
156 stars 15 forks source link

Make merging events more robust to collision #39

Open diehlbw opened 4 days ago

diehlbw commented 4 days ago

Is your feature request related to a problem? Please describe

Data loading is usually three steps, load prediction/output info, load event info, and then into a single frame. Especially with the added support around using frames from memory there are cases where the columns being created from merge could already exist in the prediction frame.

Currently, this situation will raise an unhandled error during some post-processing, which has strong expectations on the new columns. They are missing since pandas prevents collisions by adding suffixes.

Describe the solution you'd like

To improve the observed behavior we should update loader.event to shortcircuit and log a debug message instead of calling _merge_event for one_event if the target columns already exist.

We should also consider make the underlying code more robust; probably updating pandas_helpers._merge_next to reduce the merge_cols_without_times argument to columns that exist in the merge_asof result.