Registering samples on the uBiome Explorer

dhimmel commented 6 years ago

I messaged Elies regarding the status of our samples. It turns out, according to Elies, uBiome processed the samples on 2017-10-11! She wrote:

Let me know if you can access the data from the Explorer website or not. There was no user associated with the samples in the database, so maybe you never registered them?

It didn't occur to me to register these kits in the Explorer, which is why I didn't notice the processing was complete. So I just registered the samples in the Explorer now on 2017-12-13.

Methods

I clicked on "Activate Kit" and entered the kit info from kits.tsv. For each sample (kit-tube pair), I entered the date as 2017-07-20 (for when @bemert transferred the samples into the uBiome tubes, see https://github.com/dhimmel/fratjuice/issues/2#issuecomment-316837619). Then I changed the sample type to custom from gut/spare. Custom is documented as:

Custom samples are a special category for non-standard sampling, e.g. sampling your pet. Is this a custom sample?

In the notes section, I added the sample information. This was a manual process, so I was careful not to make errors. Elies sent me some data, which we can use in the future to confirm that our sample-to-tube assignments are in concordance.

Here is a screenshot of the 10 samples on the uBiome Explorer:

ubiome-explorer-1 ubiome-explorer-2

Note that we have data on both the concentrated and unconcentrated preparations, which hopefully will provide some comforting redundancy.

dhimmel commented 6 years ago

@eliesbik I'm wondering what identifier system uBiome uses for tubes. This will allow us to more easily translate between our kit/tube information and specific samples.

From downloading the data in the explorer (see #6), it looks like our ten tubes have the following IDs:

When I export taxonomy data as JSON, I see a line like:

  "sequencing_revision": "300987",

So perhaps these identifiers are called sequencing_revision IDs? Anyways, is this ID also the ID on each tube (if you were to read the QR code)? It would be helpful if we could add a sequencing_revision column to kits.tsv and add any other IDs to that table that uBiome uses.

Right now kits.tsv doesn't have a column that a unique identifies each sequencing dataset (uniqueness comes from combining sample_id and sample_type). The goal here is to:

get the right identifiers to identify the sequencing data
get any other IDs that will help bridge analyses (e.g. what the tube QR code contains)

dhimmel commented 6 years ago

Regarding my above questions about identifiers, the spreadsheet provided by @eliesbik in #7 appears to contain the relevant information (and more)! Specifically the Mapping sheet of results.xlsx contains these columns (showing only first four rows):

Sample_ID	Sample_type	SeqID	tubeId	barcode	order_id
FJ5 - gut	concentrated	300909	NA0012382439	699179835	177574
FJ1 - spare	unconcentrated	300921	NA0013010984	592185324	183059
FJ2 - gut	concentrated	300945	NA0008969520	606185447	183182
FJ2 - spare	unconcentrated	300948	NA0013010993	606185447	183182

A few questions @eliesbik to make sure I understand all the identifiers:

tubeId is the unique identifier for each tube, which is encoded by the QR code? We probably will never need this, but it's good so we know which tube at uBiome corresponds to which sample.
SeqID, the "unique sequence id", refers to the output from a sequencing run? In our case, each tube was only sequenced once, so the tubeId-to-SeqID mapping is one-to-one.
barcode is the dehyphenated kit identifier.

eliesbik commented 6 years ago

Happy to answer your questions @dhimmel !

tubeID is the QR code, and refers to the physical tube that holds each sample.
1. SeqID is indeed the output of a sequencing run. One tubeID can have multiple SeqIDs, for example when a sequencing run did not meet our quality controls, or when a particular sample did not yield enough reads. In both of these cases, we will redo the PCR and sequencing of that sample, or all samples in that run, and a sample will then have two SeqIDs. This is the identifier that will be linked to your raw data.
2. Correct.

Regarding when the samples were processed, 10-11-2017 refers to October 11, not November 10?

Yes

dhimmel / fratjuice

Registering samples on the uBiome Explorer #5

Methods