Open johnsaigle opened 4 years ago
@johnsaigle is there more than 999999 candidates or is it just for compatibility ?
Compatibility. The IDs come in as example 4356001
and it's a read-only dataset.
Then those aren't LORIS CandIDs, those are other types of identifiers.
I copied this issue from an email with @xlecours and @samirdas. Samir asked that we implement this feature so I volunteered to create an issue, so I'm just the messenger in this case.
I do think it's a good idea for this to be flexible though. For most projects they can use the default 6 but as we move into supporting more open datasets, e.g. in PREVENT-AD and UK Biobank, I'm sure this will come up again.
@johnsaigle I would say it's optional if # of candidates < 999999 and mandatory otherwise...
So in this case it seems like that number can be either a linked external identifier or even PSCID if the project decides so... I dont think the CandID should be "imported" from another system
I see your point that it could be just a PSCID or ExternalID generally. I didn't do the overrides but maybe Xavier could comment on why the IDs were used as CandIDs
I think it was because we wanted to avoid the semi-random-colision-free candid generation that would have taken a lot of time to find a suitable CandId for the last participants. We decided to take a "know" value; the UKB id.
If there's less than a million but close to a million you can use avoid collisions by using an incrementing number and randomizing the order in your importer. It would be much less work than trying to change the behaviour of CandIDs in LORIS, which is used as a foreign key all over the place.
That's fair. I didn't realize it was locked to 6 integers in the DB too. That would make this a much more serious change.
Discussed in the LORIS meeting today - we're going to have a meeting to sort out ID stuff generally in late June:
Describe the solution you'd like CandIDs are hard-coded to be exactly 6 digit integers between 10000 and 999999.
There should be a configuration option that allows a project to specify a greater size if they wish.
Additional context This issue arose for the UK Biobank project which imports existing IDs from an external dataset. These IDs are 7 digits in length and therefore incompatible with the hard-coded CandID length in LORIS. Extensive overrides were made to the codebase in order to compensate.
6625 is one example.