AG-Boerries / CAST-Seq

CAST-Seq Bioinformatic pipeline
GNU Affero General Public License v3.0
5 stars 1 forks source link

Databases from ANNOVAR #1

Closed lea-lenhardtackovic closed 2 years ago

lea-lenhardtackovic commented 2 years ago

Hi, I am looking into implementing your pipeline, however, from the README and the code, it is unclear to where ANNOVAR databases are used in the pipeline? The ANNOVAR databases are listed in the requirements section (with bowtie2Index and genome fasta). Could you clarify where and how are the ANNOVAR databases being used and which specific ANNOVAR database should be prepared/downloaded? Am I correct in assuming that bowtie2Index files are only being used for bowtie2 alignment and that the indices are created from the reference genome file? Is maybe that reference genome file in some way ANNOVAR specific or are we missing something?

Thank you.

Best, Lea

gandrigit commented 2 years ago

Hi Lea, Thank you for pointing this out. It is actually a mistake in the README file (will be modify in the next update). There is indeed no need for ANNOVAR databases in the whole pipeline. You correctly assumed that bowtie2 index files are only used during the alignment. So all you need is a "bowtie2Index" folder (in /annotations/human/) with a genome.fa and the corresponding bowtie2 index files (genome.1.bt2, genome.2.bt2...). If you need more details about the bowtie2 alignment, you can look at the fastq_aln.sh script.

Kind regards, Geoffroy

lea-lenhardtackovic commented 2 years ago

Hi Geoffroy, thank you for the response and clarification. I will close the issue.

Best, Lea