Database naming convention

jbadger3 commented 6 years ago

@afbarnard

I saw your email about the mcrf database debacle. I'm glad you chimed in. I was really confused:) What did you have in mind for a naming convention for database versions? I don't have any strong opinions. Looks like you had in mind db_name.date_of_download.additional_info.sqlite3?

afbarnard commented 6 years ago

Yes, that's the convention I had in mind. In particular, the "data date" identifies versions of the data and connects files across various directories. Unfortunately, the data files have multiple dates, so choosing the data date is a judgment call. But for the latest data we can use the date of the zip file (07/05). The order of the "fields" in the DB name allows for sorting (as opposed to reversing the date and comment fields). This will come into play if we ever have multiple DBs representing the same data, for example, if we created a version of the data expressed as facts (pt-id, fact-id, value) and events (pt-id, date, event-id, value), we could name it something like emr.20180705.events.sqlite3 to sit next to emr.20180705.snomed.sqlite3.

jbadger3 commented 6 years ago

I like the additional sorting you mentioned. Should we add a doc somewhere to describe the naming convention in case anyone else is interested in adding a derived dataset to share? Or perhaps add something to the README.txt in the shared sqlite_dbs folder?

afbarnard commented 6 years ago

I would vote for mentioning these sorts of things in the READMEs.

DavidPageGroup / cdm-data

Database naming convention #4