cfe-lab / MiCall

Pipeline for processing FASTQ data from an Illumina MiSeq to genotype human RNA viruses like HIV and hepatitis C
https://cfe-lab.github.io/MiCall
GNU Affero General Public License v3.0
14 stars 9 forks source link

Include review information in FASTA headers #132

Closed donkirkby closed 9 years ago

donkirkby commented 10 years ago

Richard H. asked for the pipeline score, minimum coverage, and minimum coverage position to be included in the headers when we download a FASTA file from the Micall page. Let's work on this after the review process is done.

ArtPoon commented 9 years ago

Specifically, for the FASTA containing consensus sequences.

donkirkby commented 9 years ago

What should we do when there are multiple review decisions based on the same consensus sequence? For example, HLA will map one region that contains two exons, and then produce a separate decision for each exon.

The simplest way to join the decision records to the consensus sequence is through the run id, the sample number, and the seed name. Here's an example query:

select seed.name, dec.score, cons.conseq_cutoff, dec.min_coverage
from lab_miseq_review_decisions dec
join lab_miseq_regions seed
on dec.seed_region_id = seed.id
right join lab_miseq_conseq cons
on dec.review_id = cons.review_id
and dec.sample_name like ('%\_' || cons.snum) escape '\'
and seed.name = cons.region
where cons.id=24959;
donkirkby commented 9 years ago

Richard asked to have all the decisions concatenated together. Here's an example header with all of the decisions concatenated together:

>02-Feb-2015|HLA-B-seed|40352A_HLA-B|S14|15|0.020|HLA-B/HLA-B-exon2|4|134931|73|HLA-B/HLA-B-exon3|4|319361|3