Open mshadbolt opened 5 years ago
Input: dissociated cells as cell suspensions
Output: >=2 library preparations from a single cell suspension.
The cell barcoded cDNA from the cells in the cell suspension is split between the library preparations after amplification. 1 library is standard 10X 5' style gene expression (gex). The other libraries are the result of a PCR enrichment step using primers. There is an Enrichment kit that can be used to enrich for either B cells or T cells. You can enrich for both from the same cell suspension input. This would result in 3 different libraries, 5 prime tag based expression, paired end T cell VDJ sequences, paired end B cell VDJ sequences. The VDJ sequences may either have the same read lengths as the gex libraries or one can do paired end 150bp reads. The cell barcoding and UMI layout is the same and occur at the start of read 1.
Reads from each VDJ enrichment library are assembled or aligned to known VDJ sequences.
The current dataset that I am wrangling (Kylie James Colon Immune cells) has the three libraries as described above.
Should there be a single library prep protocol for all libraries or a separate one for each library type?
The way I have modelled it in my experiment is to have separate protocols for each library, e.g.
10x_v2_5p_gex_library_prep_protocol
10x_v2_5p_vdj_TCR_library_prep_protocol
10x_v2_5p_vdj_Ig_library_prep_protocol
This would enable the user to see which sequence file derived from each library. Open to suggestions if this is a good idea or not.
Should we add a field to the library_preparation_protocol
module that would capture the 'enrichment' primers?
Does the analysis team need any specific fields added that would need to be captured to enable analysis? (Perhaps too early to tell if pipelines for this data are a long way off)
To populate library_preparation_protocol.library_construction_method.ontology
we will need a new V(D)J specific term
Should the term be something like 10X 5' v2 V(D)J sequencing
?
Should this sit underneath 10X 5' v2 sequencing
?
Should the gex libraries also get this term or only the vdj specific libraries?
It seems like so far there are v1 and v1.1 versions of V(D)J libraries, but I'm not sure the difference so not sure if we need to have both ontology terms.
To populate library_preparation_protocol.input_nucleic_acid_molecule.ontology
should we request a more specific term to indicate polyA RNA from TCR (T cells) or Ig (B cells) ?
Chromium Single Cell V(D)J Reagent Kits - User Guide 10x-pert Workshop | Characterization of the Tumor Microenvironment with the Chromium Single Cell Imm Sequencing Requirements for Single Cell V(D)J Experimental design for V(D)J libraries Chromium VDJ presentation with nice diagrams
I also have a question about how the sequencing method for this kind of assay type.
The V(D)J libraries that are enriched for either T cells or B cells, in my case the sequencing for these libraries was paired end 150bp on the HiSeq4000, would this be considered 'tag-based single cell RNA sequencing' or 'full length single cell RNA sequencing'
This would ideally be resolved during this sprint so that I have timeline for when I can ingest Kylie's dataset
Can you propose which schemas need to change/which need to be added?
We are going to struggle to get feedback this week from the US given thanksgiving plus any schema changes will take 3 weeks to make it to production
I outlined my understanding and potential way of modelling this type of experiment in the ticket above but you keep asking for a 'proposal' . Is there some specific way you want me to do this?
If you know which schemas need to be edited, make the PR with those edits and we can ask relevant people to review the PR
If you don't know what edits are needed, do you have a plan get answers to the questions so edits can be made?
I'll leave this ticket open but we need to include the vdj information in an SOP.
we should also talk with SCEA when we do this to ensure we model this the same way
Description
As a data wrangler wrangling a project with 10X 5' V(d)J data I need to determine the correct way of defining the experiment in our spreadsheet and determine whether any additional fields or ontology terms are needed. Firstly I will need to get a better understanding of what V(D)J sequencing is and how it differs from 10X 5' sequencing. I heard that perhaps @ambrosejcarr and perhaps @TimothyTickle had started thinking about this so please provide any opinions if you have any. Not sure where this lies on the analysis pipelines roadmap and what type of information would need to be captured above and beyond what we would capture from a standard 10X 5' sequencing library **Acceptance Criteria**e.g.: