Aarhus-Psychiatry-Research / psycop-model-training

Shared code for model training and evaluation.
Other
1 stars 0 forks source link

fix: reset index introduces nan #445

Closed sarakolding closed 1 year ago

sarakolding commented 1 year ago

Notes for reviewers

I think I found out what is happening. Since index is only reset for a single column that is then concatenated back onto the dataset (and I think this is done by matching on indices), a lot of values from the [n_in_bin] column will not have any matches, or will be matched with the wrong bin. I would have done it at the end instead, but the columns that are outputted are also concatenated onto dataframes in the plots, so I would suspect it would only move the issue. Maybe the fix is to reset the index after these steps in the plot functions? You probably have a better overview of what would work best

sarakolding commented 1 year ago

seems to only be a problem for us in plot_metric_by_time_until_diagnosis, not in plot_auc_by_time_from_first_visit, for some reason

github-actions[bot] commented 1 year ago
Tests Skipped Failures Errors Time
1 0 :zzz: 0 :x: 1 :fire: 5.216s :stopwatch:
MartinBernstorff commented 1 year ago

Huh, that's a bit of a mess! Thanks for taking a look - I'll dive in later today.

MartinBernstorff commented 1 year ago

Since index is only reset for a single column that is then concatenated back onto the dataset (and I think this is done by matching on indices), a lot of values from the [n_in_bin] column will not have any matches

Oh, we need to change this if it happens so that it merges on bin.

I've also added a few tests to hopefully stabilise this interface.