IARC-CSU / CanReg5

CanReg5 is a multi user, multi platform, open source tool to input, store, check and analyse cancer registry data.
http://www.iacr.com.fr/CanReg5
GNU General Public License v3.0
24 stars 13 forks source link

C202306 - Add a Global Unique IDentifier (GUID) in CanReg data model #128

Open infotel4iarc opened 1 year ago

infotel4iarc commented 1 year ago

The Global UID is then a strong hash created at the start of the patient's db life.

A lot of time, databases are created on different places and users want to centralize it in the same cluster. The idea is to have a fixed Unique IDentifier that would folllow the same patient through its database life (even though the record is updated afterwards, it would remain the same).

Why a GUID?

cchenginfotel commented 1 year ago

If we insert into the "PATIENT" table one new column, we'll also need a script to alter the user's database by adding and generating a UUID

cchenginfotel commented 1 year ago

Using the Migrator class, the database will now be upgraded when it gets launched. it will add a "GUID" column in the patient Table and update every existing patient with a GUID.

For reference, on my computer, it takes 6 seconds to update a table with 21k patients and 109 seconds for a table with 111k patients.

https://github.com/infotel4iarc/CanReg5/tree/C202306-Add_GUID_in_patient

cchenginfotel commented 1 year ago

Concerning the GUID issue, the code needs to be updated for the "import" and "export" data features. Currently, the "import" and "export" data features aren't taking account of the GUID. This will have impacts on the application itself but also on the API as one Patient record can have two different GUID in two distinct databases (after exporting the data from one database to another).

It might be worth considering pausing the development of the GUID feature until the other features are merged into "master". This is because the update of the import and export on both the application and the API will lead to a lot of modifications which would lead to a complicated merge considering the number of feature branches we already have.

cchenginfotel commented 1 year ago

According to what was discussed during the weekly meeting on Friday, I've kept the GUID silent in the code logic. The code will continue to use the PRID to check if the data is already present. The user is able to choose to export or not the GUID when exporting data There are cases where the user can generate two identical patients with two different GUID, and there are cases where the user can rewrite existing GUID, but as discussed, we'll trust the user not to do so.

cchenginfotel commented 1 year ago

Fixed the issue during the import when the code overlooked the UUID provided in the data file. It will now properly keep the UUID given in the source data file. https://github.com/infotel4iarc/CanReg5/commit/e7842f8f78cf0ec1b59648fe1e10dfb875d4d5e9

Also, I've found these global variables which look odd considering what we discussed during the weekly meeting: https://github.com/infotel4iarc/CanReg5/blob/e7842f8f78cf0ec1b59648fe1e10dfb875d4d5e9/src/canreg/common/Globals.java#L303 This doesn't seem to impact the code logic, but it makes the understanding more complicate

cchenginfotel commented 1 year ago

fixed the issue where the import couldn't be done into the holding database. The query to create the table was missing a parameter https://github.com/infotel4iarc/CanReg5/commit/eef92952c4c3e8062bff9ee50606853cd180631e

cchenginfotel commented 1 year ago

Added a script to automatically add the UUID into the XML description file https://github.com/infotel4iarc/CanReg5/commit/74a24500e597e7b480329fb1263a32cc9fe3d9ce