IARC-CSU / CanReg5

CanReg5 is a multi user, multi platform, open source tool to input, store, check and analyse cancer registry data.
http://www.iacr.com.fr/CanReg5
GNU General Public License v3.0
24 stars 13 forks source link

C24a: Duplicate search: read the patients only once and keep only the necessary data in memory #101

Closed infotel4iarc closed 2 years ago

fbinfotel commented 2 years ago

I started working on the issue on 25/08/2021 . The code can be found on this repo : https://github.com/infotel4iarc/CanReg5/tree/feature/C24a

I have tested the duplicate search using the patient.tsv file from the import CanReg Dataset.

fbinfotel commented 2 years ago

The functionality has been implemented. The generated files are identical between the old and the new duplicate search but the processing time has been greatly reduced

fbinfotel commented 2 years ago

the branch has been reviewed and merged with the dev branch here https://github.com/infotel4iarc/CanReg5/tree/dev