Airtable Subjects design

Airtable design decision needed:

At present, we have a Subjects table to contain information about subjects (eg sex, DOB, SES, or for environmental sites, Lat/Lon, description, etc). This links the subject to Projects, since subject IDs only need be unique within-project (eg MARCH subject IDs are just integers starting at 1). In theory, this should be fine, but it's causing some problems. In particular, if we're using Subjects to link to Projects, then samples that are currently missing subject IDs (eg many Malawi samples) don't get linked to the correct Project. It also means that uids in the Subjects table are not actually unique, which causes problems with indexing.

There are a couple of solutions, none of which are ideal, but I think what I've settled on is:

For projects (like MARCH/ RESONANCE) that have subject IDs likely to cause conflicts, add the project to the subject ID (eg MARCH1, MARCH2, etc). This duplicates some information, but prevents the need for special-casing table indexing operations. Potential problems here - samples like BISC314-1, BISC314-2 may end in conflicts in later years. We may need to add some validations to importer scripts to flag these.
Give Malawi samples temporary subject IDs until we learn the correct ones. Since these samples can't be linked to other metadata anyway, it's ok to remove these / not track them, etc. It prevents the need to redesign the database to accommodate this mistake. Main downside is that it's just one more thing to keep track of, but I don't think that's so bad, as long as we don't ever rely on these IDs for anything.

Klepac-Ceraj-Lab / VKCComputing.jl

Airtable Subjects design #19