bmansfeld / QTLseqr

QTLseqr is an R package for QTL mapping using NGS Bulk Segregant Analysis
64 stars 42 forks source link

Yang2013data #8

Closed TinaNeik closed 6 years ago

TinaNeik commented 6 years ago

Hi,

I have done NGS-BSA on 8 bulks of samples. SNPs have been called and filtered using vcf tools. The vcf file is now ready to be used for QTLseqr.

Referring to the QTLseqr workflow:

load the package

library("QTLseqr")

Set sample and file names

HighBulk <- "SRR834931" LowBulk <- "SRR834927" file <- "SNPs_from_GATK.table"

Choose which chromosomes will be included in the analysis (i.e. exclude smaller contigs)

Chroms <- paste0(rep("Chr", 12), 1:12)

Import SNP data from file

df <- importFromGATK(

    file = file,
    highBulk = HighBulk,
    lowBulk = LowBulk,
    chromList = Chroms

)

How do I convert my vcf file into the "SNPs_from_GATK.table format?

I checked the Yang2013data paper. They only have Table S1-16. How does the raw data look like?

Also, how do I define "HighBulk" and "LowBulk" since I have 8 bulks?

Appreciate your help.

Many thanks.

bmansfeld commented 6 years ago

Hi Tina, As of now, QTLseqr is designed to work with files output from GATKs VariantsToTable function. This is a VCF parser that translates the VCF format to a table containing the information we need to work with and do the analysis. The VariantsToTable function accepts VCFs from any SNP caller, however I'm not sure about the names of the genotype fields used by yours. These are not uniform among SNP callers unfortunately. Please read the vignette under Input Data to see how the files and columns should be formatted. It is all explained there.

I am currently working on an update that should be released this week (maybe even this evening EST time, if I get around to it...) and it will include a function to import directly from a csv of allele frequencies. So if VariantsToTable doesn't work for you, you will be able to just read the allele depths from your VCF and set up a csv file with the necessary data.

In regard to the multiple bulks, if I understand correctly these are derived from multiple populations or for multiple traits on the same population? Currently QTLseqr is only made to work with one population i.e two bulks at a time. At the moment, I don't have plans to add the ability to manage multiple pops at once. What I recommend you do is split the VCF into multiple files for each pop, so each one has 2 bulks in it. This could be done either before or after you make a csv file, if that is the way you choose to go.

I'll update here when v7.0.0 comes out so you get an email and you can update. -Ben

TinaNeik commented 6 years ago

Many thanks, Ben.

Meanwhile, I will do the GATK VariantsToTable conversion.

Yes, I have two traits - resistance and susceptible. For each trait, I have two populations. That's 4 bulks. I'm also looking at combining the two populations for each trait. That's 6 bulks. And lastly, the two parental genotypes. So total, 8 bulks.

Looks like I have to do 2 bulks at a time for QTLseqr, no problem.

Please let me know once you are ready with v7.0.0.

Thank you.

Cheers, Tina

bmansfeld commented 6 years ago

The option to import from a csv or other delimited file is now available in v7.0.0. Please read the vignette on how to do this.