UCSF-Costello-Lab / LG3_Pipeline

The original LG3 pipeline
https://github.com/UCSF-Costello-Lab/LG3_Pipeline
0 stars 0 forks source link

Allow for updating the genome (FASTA) reference #59

Open HenrikBengtsson opened 5 years ago

HenrikBengtsson commented 5 years ago

Background

Ziv lab needs to switch the genome reference file for their needs. The first step this comes in to the pipeline is the BWA alignment step.

Task(s)

Make it possible to change the genome reference FASTA file for the alignment step. After that, look at the remaining steps and what other reference files that needs to be updated as well.

ivan108 commented 5 years ago

What genome do they need?

HenrikBengtsson commented 5 years ago

A GRCh37 reference (i.e. chromosomes/sequences does not have the chr prefix), e.g.

$ head -2 /home/shuntsman/ref/broad/Homo_sapiens_assembly19.fasta
>1 dna:chromosome chromosome:GRCh37:1:1:249250621:1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

(the filename does not reflect GRCh37 but the file content does)

ivan108 commented 5 years ago

I am glad it is not GRCh38... :) The easiest way is to rename chromosomes in the reference fasta file, and rebuild bwa indexes. Otherwise we will have to rename chromosomes in all annotations..

HenrikBengtsson commented 5 years ago

Ok, thxs. We decided to stick with the current hg19 (the unknowns in this pipelines are too many and the pay off might be zero) - the rationale for using GRCh37 is not really there.