Klepac-Ceraj-Lab / VKCComputing.jl

1 stars 0 forks source link

Airtable Revamp #9

Open kescobo opened 1 year ago

kescobo commented 1 year ago

Actually, I decided to just get the jump on it. Nothing is set in stone, but I thought it would be good to start getting some ideas down. You can take a look at https://airtable.com/appkVoztTe5ILJNT3/tblS2vOVk3PNEVBq4/. It's still very much work in progress, but I think the basic structure is there.

There's a bit that's different from what we previously talked about, and some cool things that I got to work.

Tables

  1. The Biospecimens table is for the initial collections
  2. The Sites table is for collection locations, eg Rhode Island, Blantyre (Malawi), and South Africa
  3. The CollectionBuffer contains info about different collection types. One reason for splitting this out is that it will make adding standard values for eg SRA upload easier.
  4. The SequencingPrep table is for downstream products from samples, eg aliquots, extractions, etc. There my be some duplication here - eg if multiple extractions are done on the same aliquot, but I think that's rare and probably fine.
  5. The SequencingBatches table is for information about individual sequencing runs. I thought it didn't really make sense to split these into amplicon / mgx. I think Metabolomics should probably be a separate table.
  6. The Projects table is for projects, and can reference multiple Sites

Principles

Changes

There are a few things that are pretty different than what we talked about. Putting here for discussion - again, nothing here is set in stone

Cool Things

Challenges / Known Failures

kescobo commented 1 year ago

image

image

image

image

image

kescobo commented 1 year ago

Copy from Slack

Guilherme Fahur Bottino 25 minutes ago Two ideas: 1 (high impact) - something like "nominal timepoint" or "visit label" to account for, in case of human samples with specific collection timeframes, which visit was that (3-month, 6-month, 9-month, 12-month, etc) 2 (low impact) - something equivalent to "collection_tube" or "collection_rep" to account for when, for the same specific collection, there were multiple tubes collected;

Kevin 24 minutes ago Yeah, I'm fine with either. Though it kind of feels like if we have 2 tubes collected, they should have different sample IDs

Kevin 23 minutes ago Even if they're the same timepint, same subject, etc

Guilherme Fahur Bottino 23 minutes ago They would have, but in case of hash/random sample IDs, it would ne hard to tell which one was the second collection, for example. And in most cases, I want to think that there is a good reason to collect "again" and we should be able to jkeep track of it

Kevin 22 minutes ago "visit" or some other metadata that's equivalent to the 6MO / 3 MO thing is totally fine by me

Kevin 22 minutes ago it would ne hard to tell which one was the second collection, On the tube, sure. But why is it relevant when dealing with the tubes?

Guilherme Fahur Bottino 20 minutes ago My reasoning is that if we do not keep the information of what was the first and second collections somewhere, this makes room for confusion or loss of information on data handover.

Guilherme Fahur Bottino 20 minutes ago somewhere in the table, I mean.

Guilherme Fahur Bottino 19 minutes ago I know that the SequencingPrep keeps track of "which tube was used"

Guilherme Fahur Bottino 17 minutes ago I just want any person in the future to know that this tube was "the first" or "the second" with a clear annotation. It can be a comment, if you dont want to generate another column - I just think it should be somewhere.

Guilherme Fahur Bottino 17 minutes ago If we use the "notes" column for this, thats fine./

Guilherme Fahur Bottino 14 minutes ago In that sense, what about a "status" column on the SequencingPrep table to deal with samples that we might have to discard from analysis?

Kevin 13 minutes ago want to hop on zoom?

Guilherme Fahur Bottino 13 minutes ago Imagine that we extract twice from the same tube because the first extraction was poor. We want to be able to filter out the extraction that should not be used before joining the tube notes.