Airtable Revamp - Githubissues

Actually, I decided to just get the jump on it. Nothing is set in stone, but I thought it would be good to start getting some ideas down. You can take a look at https://airtable.com/appkVoztTe5ILJNT3/tblS2vOVk3PNEVBq4/. It's still very much work in progress, but I think the basic structure is there.

There's a bit that's different from what we previously talked about, and some cool things that I got to work.

Tables

The Biospecimens table is for the initial collections
The Sites table is for collection locations, eg Rhode Island, Blantyre (Malawi), and South Africa
The CollectionBuffer contains info about different collection types. One reason for splitting this out is that it will make adding standard values for eg SRA upload easier.
The SequencingPrep table is for downstream products from samples, eg aliquots, extractions, etc. There my be some duplication here - eg if multiple extractions are done on the same aliquot, but I think that's rare and probably fine.
The SequencingBatches table is for information about individual sequencing runs. I thought it didn't really make sense to split these into amplicon / mgx. I think Metabolomics should probably be a separate table.
The Projects table is for projects, and can reference multiple Sites

Principles

In general, I think if one field can stand in for many other fields, it should be a link to a table. That's why I made CollectionBuffer a separate table. In the airtable GUI, it's not much harder to select a linked record than to pick something from a drop-down
Where things can be calculated, they should be calculated rather than entered manually.
Where things are conceptually the same, they should be part of the same table. We can use Views to narrow the scope of things that are shown. Eg, we can hide the "primers" column in the "MGX" view in SequencingBatches

Changes

There are a few things that are pretty different than what we talked about. Putting here for discussion - again, nothing here is set in stone

Rather than have a separate "Plate" table, I think it makes more sense to have plate# and well# be part of the SequencingPrep. If we send the same extraction in a different sequencing run, it should get a new row.
As mentioned above, SequencingBatches should contain all sequencing runs, rather than splitting mgx and amplicon.
The main indexing column for each table is uid (for uniqueID), and it should actually be unique, and snake-case. We can have more descriptive things in a name column, (see Sites for examples).

Cool Things

for the SequencingPrep table, you can just enter the biospecimen, extraction number and aliquot number and the UID will be automatically computed
I also figured out how to automatically compute the S_well number from the plate well number. Eg B4 -> S26
It should be relatively trivial to generate IMR submissions sheets directly from airtable so Shelley doesn't have to do it by hand (or the other way around if you prefer)

Challenges / Known Failures

If the same sample gets plated in the same well on two different plates on the same sequencing batch, it will not have a unique UID
I think if we run into the problem we had in the last batch, where the exact same plate gets sequenced twice, we should just increment the sequencing batch for the second run

Copy from Slack

Guilherme Fahur Bottino 25 minutes ago Two ideas: 1 (high impact) - something like "nominal timepoint" or "visit label" to account for, in case of human samples with specific collection timeframes, which visit was that (3-month, 6-month, 9-month, 12-month, etc) 2 (low impact) - something equivalent to "collection_tube" or "collection_rep" to account for when, for the same specific collection, there were multiple tubes collected;

Kevin 24 minutes ago Yeah, I'm fine with either. Though it kind of feels like if we have 2 tubes collected, they should have different sample IDs

Kevin 23 minutes ago Even if they're the same timepint, same subject, etc

Guilherme Fahur Bottino 23 minutes ago They would have, but in case of hash/random sample IDs, it would ne hard to tell which one was the second collection, for example. And in most cases, I want to think that there is a good reason to collect "again" and we should be able to jkeep track of it

Kevin 22 minutes ago "visit" or some other metadata that's equivalent to the 6MO / 3 MO thing is totally fine by me

Kevin 22 minutes ago it would ne hard to tell which one was the second collection, On the tube, sure. But why is it relevant when dealing with the tubes?

Guilherme Fahur Bottino 20 minutes ago My reasoning is that if we do not keep the information of what was the first and second collections somewhere, this makes room for confusion or loss of information on data handover.

Guilherme Fahur Bottino 20 minutes ago somewhere in the table, I mean.

Guilherme Fahur Bottino 19 minutes ago I know that the SequencingPrep keeps track of "which tube was used"

Guilherme Fahur Bottino 17 minutes ago I just want any person in the future to know that this tube was "the first" or "the second" with a clear annotation. It can be a comment, if you dont want to generate another column - I just think it should be somewhere.

Guilherme Fahur Bottino 17 minutes ago If we use the "notes" column for this, thats fine./

Guilherme Fahur Bottino 14 minutes ago In that sense, what about a "status" column on the SequencingPrep table to deal with samples that we might have to discard from analysis?

Kevin 13 minutes ago want to hop on zoom?

Guilherme Fahur Bottino 13 minutes ago Imagine that we extract twice from the same tube because the first extraction was poor. We want to be able to filter out the extraction that should not be used before joining the tube notes.

Klepac-Ceraj-Lab / VKCComputing.jl