ARS-toscana / ConcePTIONAlgorithmPregnancies

Repository of the script of the ConcePTION Algorithm for Pregnancies
GNU Affero General Public License v3.0
5 stars 3 forks source link

Issue with validation sample #23

Closed DSThayer closed 2 years ago

DSThayer commented 2 years ago

The pregnancy sample CSV file the instructions tell us to use for validation doesn't include the person_id, which makes it impossible to validate pregnancies, since they can't be linked to other records.

The equivalent .Rdata file does have person_id, but it doesn't group all pregnancy records with the same pregnancy_id. Only one record per pregnancy has pregnancy_id assigned.

Understanding which pregnancy a record was assigned to by the algorithm is important for validating it--especially for error/edge cases that may occur, such as pregnancies that are close together or overlapping.

Note: I'm using the UMC-Utrecht version of the algorithm, because I think that's what I was supposed to be doing for CONSIGN. Let me know if this was wrong. I didn't seem able to raise the issue there.

GiorgioLimoncella commented 2 years ago

All pregnancies in the table should already be linked to the records that compose it. The structure of the dataset "sample_from_pregnancy" is as follows: for each pregnancy you have a first row indicating the result of the algorithm (also containing the pregnancy_id, which was generated by the algorith) and listed below all the records that composed the pregnancy.

Immagine 2022-08-04 085031

Then all rows that consecutively do not have pregnancy_id are part of the same pregnancy, and refer to the previous pregnancy_id.

However sometimes it happened that the file did not come generated correctly, let me know!

DSThayer commented 2 years ago

Hi Giorgio,

Thanks for explaining. I imported the data into our database to link to source data for validation. In so doing, this relationship was lost, since DB tables do not have a deterministic order. However, that is something I can address in my import.

Dan