dhimmel / fratjuice

Uncovering the microbes of fraternity basements
Creative Commons Zero v1.0 Universal
6 stars 2 forks source link

Registering samples on the uBiome Explorer #5

Open dhimmel opened 6 years ago

dhimmel commented 6 years ago

I messaged Elies regarding the status of our samples. It turns out, according to Elies, uBiome processed the samples on 2017-10-11! She wrote:

Let me know if you can access the data from the Explorer website or not. There was no user associated with the samples in the database, so maybe you never registered them?

It didn't occur to me to register these kits in the Explorer, which is why I didn't notice the processing was complete. So I just registered the samples in the Explorer now on 2017-12-13.

Methods

I clicked on "Activate Kit" and entered the kit info from kits.tsv. For each sample (kit-tube pair), I entered the date as 2017-07-20 (for when @bemert transferred the samples into the uBiome tubes, see https://github.com/dhimmel/fratjuice/issues/2#issuecomment-316837619). Then I changed the sample type to custom from gut/spare. Custom is documented as:

Custom samples are a special category for non-standard sampling, e.g. sampling your pet. Is this a custom sample?

In the notes section, I added the sample information. This was a manual process, so I was careful not to make errors. Elies sent me some data, which we can use in the future to confirm that our sample-to-tube assignments are in concordance.

Here is a screenshot of the 10 samples on the uBiome Explorer:

ubiome-explorer-1 ubiome-explorer-2

Note that we have data on both the concentrated and unconcentrated preparations, which hopefully will provide some comforting redundancy.

dhimmel commented 6 years ago

@eliesbik I'm wondering what identifier system uBiome uses for tubes. This will allow us to more easily translate between our kit/tube information and specific samples.

From downloading the data in the explorer (see #6), it looks like our ten tubes have the following IDs:

300909
300921
300945
300948
300960
300975
300987
301026
301029
301032

When I export taxonomy data as JSON, I see a line like:

  "sequencing_revision": "300987",

So perhaps these identifiers are called sequencing_revision IDs? Anyways, is this ID also the ID on each tube (if you were to read the QR code)? It would be helpful if we could add a sequencing_revision column to kits.tsv and add any other IDs to that table that uBiome uses.

Right now kits.tsv doesn't have a column that a unique identifies each sequencing dataset (uniqueness comes from combining sample_id and sample_type). The goal here is to:

  1. get the right identifiers to identify the sequencing data
  2. get any other IDs that will help bridge analyses (e.g. what the tube QR code contains)
dhimmel commented 6 years ago

Regarding my above questions about identifiers, the spreadsheet provided by @eliesbik in #7 appears to contain the relevant information (and more)! Specifically the Mapping sheet of results.xlsx contains these columns (showing only first four rows):

Sample_ID Sample_type SeqID tubeId barcode order_id
FJ5 - gut concentrated 300909 NA0012382439 699179835 177574
FJ1 - spare unconcentrated 300921 NA0013010984 592185324 183059
FJ2 - gut concentrated 300945 NA0008969520 606185447 183182
FJ2 - spare unconcentrated 300948 NA0013010993 606185447 183182

A few questions @eliesbik to make sure I understand all the identifiers:

  1. tubeId is the unique identifier for each tube, which is encoded by the QR code? We probably will never need this, but it's good so we know which tube at uBiome corresponds to which sample.
  2. SeqID, the "unique sequence id", refers to the output from a sequencing run? In our case, each tube was only sequenced once, so the tubeId-to-SeqID mapping is one-to-one.
  3. barcode is the dehyphenated kit identifier.
eliesbik commented 6 years ago

Happy to answer your questions @dhimmel !

  1. tubeID is the QR code, and refers to the physical tube that holds each sample.
    1. SeqID is indeed the output of a sequencing run. One tubeID can have multiple SeqIDs, for example when a sequencing run did not meet our quality controls, or when a particular sample did not yield enough reads. In both of these cases, we will redo the PCR and sequencing of that sample, or all samples in that run, and a sample will then have two SeqIDs. This is the identifier that will be linked to your raw data.
    2. Correct.

Regarding when the samples were processed, 10-11-2017 refers to October 11, not November 10?

Yes