Open torstees opened 4 years ago
Test comment
In order to establish uniqueness, based on a quick look at the data, it seems safe to base a sequencing object on the filename.
First pass is complete with a handful of general assignments, primarily borrowed from the KF docs of similar data.
resourceType: Task owner => Sequencing Center (Organization) authoredOn => date_data_generation
The majority of the details can be found in either of the input or output arrays. Currently, most of these are simple strings, but they can be switched to codes once we have a clear terminology to use. Which vars go into input vs out is largely arbitrary, but I believe the KF team were thinking along the lines of what goes into the actual genotyping device rather than considering the concept of a black box in which samples go in and final products come out. So, input to the actual pipelines are currently sitting in the output array.
Inputs:
Outputs:
` {
"host": "http://localhost:8000",
"type": "sequencing_data",
"body": {
"resourceType": "Task",
"id": "38924.merged.matefixed.sorted.markeddups.recal.bam",
"status": "completed",
"description": "Generate sequence data for use by researchers",
"owner": {
"reference": "Organization/FD",
"display": "FD"
},
"meta": {
"profile": [
"http://hl7.org/fhir/StructureDefinition/Task"
]
},
"identifier": [
{
"system": "urn:ncpi:unique-string",
"value": "Task|38924.merged.matefixed.sorted.markeddups.recal.bam"
}
],
"output": [
{
"type": {
"text": "Reference Genome Build"
},
"valueString": "GRCh38DH"
},
{
"type": {
"text": "Alignment Method"
},
"valueString": "bwa-0.7.15"
},
{
"type": {
"text": "Data Processing Pipeline"
},
"valueString": "3.0_DNA_Pipeline"
},
{
"type": {
"text": "Functional Equivalence Standard"
},
"valueBoolean": "false"
}
],
"input": [
{
"type": {
"text": "Sample"
},
"valueReference": {
"reference": "Specimen/4774"
}
},
{
"type": {
"text": "Analyte Type"
},
"valueString": "DNA"
},
{
"type": {
"text": "Library Prep Kit"
},
"valueString": "DNA_3.0_library_prep"
},
{
"type": {
"text": "Exome Capture Platform"
},
"valueString": "nimblegen_solution_bigexome_2011"
},
{
"type": {
"text": "Capture Region Bed File"
},
"valueString": "nimblegen_solution_bigexome_2011.hg19.list.bed"
}
],
"authoredOn": "2016-05-26"
}
},`
Closed by accident
Things have changed since this was originally described, largely as a result of further discussions with the folks from the KF team. For our current use, due to the small number of attributes, all of the input still reasonably apply to the Sequencing Task itself, however, the output has been stripped except for the actual Document Reference, which represents the actual biproduct of the sequencing process. We then attach an Observation onto that Doc Ref which contains various components describing the contents of that document, such as the Reference Sequence, Alignment Method, etc.
For first round of integration testing with Kids First, we need the priority 1 fields from sequencing (and maybe 2). These fields include:
Priority 1
seq_filename analyte_type sequencing_assay library_prep_kit_method reference_genome_build alignment_method data_processing_pipeline functional_equivalence_standard date_data_generation
Priority 2
exome_capture_platform capture_region_bed_file
Our solution will likely be mostly borrowed from the discussion of kidsfirst-sequence-experiment by the Kids First team