dereneaton / ipyrad

Interactive assembly and analysis of RAD-seq data sets
http://ipyrad.readthedocs.io
GNU General Public License v3.0
72 stars 41 forks source link

ddrad paired-end data #501

Closed aureliendejode closed 1 year ago

aureliendejode commented 1 year ago

Hello,

We are analyzing ddrad paired-end data with ipyrad. Is there a tutorial specifically dedicated to that particular type of data ? We have data in F and R fastq files that need to be demultiplexed. We have the barcodes and found the overhang for the Forward reads and managed to demultiplex those, but it is not super clear how to proceed with the reverse reads. It also appears that the overhang for those reads is not easily identified.

Any insights ?

Best

Aurélien

isaacovercast commented 1 year ago

Hello! Sorry for the delayed response. For paired-end data you do not need to demultiplex your data by hand, ipyrad handles this for you. You will need to rename your F and R raw data files so that the forward reads include the string R1 and the reverse reads include the string R2 and then you can set the params in the params file like this:

pairddrad_example_R*_.fastq.gz                               ## [2] [raw_fastq_path]
pairddrad                            ## [7] [datatype]: Datatype (see docs): rad, gbs, ddrad, etc.
TGCAG,CGG                        ## [8] [restriction_overhang]: Restriction overhang (cut1,) or (cut1, cut2)

Of course you will need to provide the path to the barcodes file in parameter 3 (barcodes_path) and you will need to change the restriction_overhang sequences to be the ones you actually used.

The R2 overhang sequences is actually pretty easy to find because it is the first few bases in the R2 file. Here is the example R2 file which uses CGG. Your R2 data should look similar, though the overhang sequence may differ:

@lane1_locus0_2G_0_0 2:N:0:
CGGGGTTAAGAGGCCAGTTAACTGCAGCGGGATCGCGCACCATAGCGGCCGTGCCTACGAGTCAGATGTCACTTTTCAGACGCTCATGGAAGTGAGTGCA
+
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
@lane1_locus0_2G_0_1 2:N:0:
CGGGGTTAAGAGGCCAGTTAACTGCAGCGGGATCGCGCACCATAGCGGCCGTGCCTACGAGTCAGATGTCACTTTTCAGACGCTCATGGAAGTGAGTGCA
+
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
@lane1_locus0_2G_0_2 2:N:0:
CGGGGTTAAGAGGCCAGGTAACTGCAGCGGGATCGCGCACCATAGCGGCCGTGCCTACGAGTCAGATGTCACTTTTCAGACGCTCATGGAAGTGAGTGCA
+
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

Give that a try and let us know how it goes.

isaacovercast commented 1 year ago

Because this is more of a support question than an 'issue' with ipyrad I am going to close this ticket, but I would encourage you to continue asking questions about how to do your assembly on our ipyrad gitter channel, which is much better for support requests like this:

https://app.gitter.im/#/room/#dereneaton_ipyrad:gitter.im

All the best, -isaac