bmansfeld / QTLseqr

QTLseqr is an R package for QTL mapping using NGS Bulk Segregant Analysis
64 stars 42 forks source link

How to deal with the multi-sample ? #39

Closed yuanlizhanshi closed 3 years ago

yuanlizhanshi commented 3 years ago

I know the QTLseqr is based on BSA-seq,it can calculate the 2 F2 sample's SNP -index. but usually we have 4 samples or more,which contains P1&P2 and 2 X F2 sample. How to filter out the SNPs that P1 & P2 & F2 contains, Then calculate the SNP -index. I will be appreciate that you consider my questions.

bmansfeld commented 3 years ago

Hi, If I understand your question correctly, you have sequenced the parental lines (neither of which are the reference genome genotype) as well as the two F2 bulks. You are now curious about filtering out SNPs in the parental lines that are not in the reference genome.

I recommend the following pipeline: 1) Align one or both parents to the reference genome 2) call Variants for that parent vs the Reference genome 3) Extract only the SNPs and exclude any INDELs (the indels will shift your sequence positions). Make sure to keep only the highest quality and confidence SNPs 4) use the FastaAlternateReferenceMaker tool from GATK or another similar tool to apply the SNPs on the the reference genome and define an alternate fasta file. 5) Align and call SNPs from your F2 bulks vs the new fasta you've just created. 6) proceed with the analysis as usual.

Hope this helps, Ben

yuanlizhanshi commented 3 years ago

Got it ,Thank you very much.