Aarhus-Psychiatry-Research / psycop-model-training

Shared code for model training and evaluation.
Other
1 stars 0 forks source link

feat: add optional arg for id column name #466

Closed sarakolding closed 1 year ago

sarakolding commented 1 year ago

Our ID is not usually dw_ek_borger (it's dw_ek_borger+admission_start_time), but when making the splits, we want to make sure that we are splitting by patient id too. Made some quick adjustments for it, but it can probably be done much simpler, and maybe it is not something we want on the main branch?

github-actions[bot] commented 1 year ago

Looks like some formatting rules failed.

✨ The action has attempted automatic fixes ✨

If any were succesful, they were committed to the branch. We suggest using git pull --rebase to apply them locally.

If some errors could not be fixed automatically, you can:

🏎️ Get results locally by running pre-commit run --all-files 🕵️ Examine the results in the Run pre-commit section of this workflow pre-commit

We also strongly recommend setting up the ruff and black extensions to auto-format on save in your chosen editor.

github-actions[bot] commented 1 year ago
Tests Skipped Failures Errors Time
47 1 :zzz: 0 :x: 0 :fire: 1m 4s :stopwatch:
MartinBernstorff commented 1 year ago

I think I need to understand why you're doing this to be able to review this PR :-)

Can you elaborate on why you would want to do it?

sarakolding commented 1 year ago

Since we kind of have two "layers" of groupings (admissions within patients, and days within admissions), we usually want to group at the highest level of granularity (individual admissions for individual patients). However, when we split our data for training, we want to make sure that individual patients are constrained within the same dataset, so one patient does not have admissions in both train and val. So our id is usually patient_id+admission_start, but specifically for splitting, we want it to be only patient_id. Hopes that makes more sense 😸