dpwickland / GB-eaSy

Bioinformatics pipeline to process genotyping-by-sequencing (GBS) data
MIT License
7 stars 3 forks source link

sample_barcodes.txt design #2

Open mictadlo opened 5 years ago

mictadlo commented 5 years ago

Hi, My files have the following pattern:

10_S0_L001_R1_001.fastq.gz
10_S0_L001_R2_001.fastq.gz
11_S0_L001_R1_001.fastq.gz
11_S0_L001_R2_001.fastq.gz
12_S0_L001_R1_001.fastq.gz
12_S0_L001_R2_001.fastq.gz

How would I have to change the sample_barcodes.txt to reflect my above files?

10_1    TGACGCCC    HindIII
10_2    CAGATC  HindIII
10_3    GAAGTG  HindIII
10_4    TAGCGGAT    HindIII

Thank you in advance,

Michal

dpwickland commented 5 years ago

The barcodes file is used to assign sample names to the corresponding barcoded samples within a single GBS library. I'm guessing the fastq files you listed represent three GBS libraries. Please note that GB-eaSy can process only one GBS library at a time.

For example: If "example sample 1" was barcoded with CAGATC during library prep, and Sample2 was barcoded with TAGCGGAT during library prep, then the two rows representing those samples in the barcodes file would be

example_sample_1    CAGATC      HindIII   
example_sample_2    TAGCGGAT    HindIII

You would replace "example_sample_1" with whatever name you want to assign to that sample; replace CAGATC with the barcode used with that sample during library prep; and replace HindIII with the enzyme used during library prep. For a given row in the barcodes file, GB-eaSy will associate the sample name with all reads containing the barcode.

The sample names will also be used as the column titles in the final VCF file produced by the pipeline.