globaldothealth / list

Repository for Global.health: a data science initiative to enable rapid sharing of trusted and open public health data to advance the response to infectious diseases.
MIT License
39 stars 8 forks source link

Make case ids shorter and easier to read #2973

Open maciej-zarzeczny opened 1 year ago

maciej-zarzeczny commented 1 year ago

Currently cases are identified by default MongoDB IDs. We should make them shorter and easier to read for curators.

abhidg commented 1 year ago

One approach is to keep the ObjectIds as is in the DB, but use a frontend function to make it more readable. Any reduction in information could lead to more collisions, but I think this scheme will make it extremely unlikely:

Number in three parts, separated by hyphens, obtained from the timestamp embedded in ObjectId:

Assuming an outbreak lasts upto 1000 days (3 years, which would be a pandemic, and thus unlikely to happen frequently), this would give a maximum number of digits as 11, while not touching the DB at all. In most cases, curators working on a single day’s cases would only need the second two bits as the number of days since outbreak would be the same.

This is assuming numerical IDs only, if we can do alphanumeric, we can shorten further by using hex or by using one of several naming systems such as https://pypi.org/project/human-id/ mapping UUIDs to a string of words; disadvantage is that alphanumeric systems usually lack monotonicity.

maciej-zarzeczny commented 1 year ago

@abhidg Those are all great ideas! I think it all depends on Curator's preferences. @aimeehan1 is there any solution that works for you better than the other?