GATB / gatb-minia-pipeline

GATB Minia assembly pipeline
29 stars 8 forks source link

Combining multiple libraries #20

Closed harish0201 closed 4 years ago

harish0201 commented 4 years ago

Hi!

I have a couple of older Illumina datasets (both PE and MP) split across multiple insert sizes and libraries.

Is it possible to pass them as a single argument, as I think that'd make life easy.

Would a FIFO sort of approach work? Or should I give a file of files (FOF) ? I believe that the FOF approach works only for SE-Reads rather than PE.

If needed I can pass MP libraries later to scaffold only.

What would you suggest?

rchikhi commented 4 years ago

Hi! Apologies for the delayed answer. Indeed the FOF strategy doesn't work as is. You'll need to 1) if applicable, for each insert size, concatenate all the left (resp. right) files from that insert size into a single left (resp. right) file 2) pass as argument each insert size library, e.g. -1 left_insertA.fq.gz -2 right_insertA.gz --mp-1 left_insertB.fq.gz --mp-2 right_insertB.fq.gz in order from smaller insert size to larger (ie in my example, insert size A is smaller than B)

Alternatively: 1) run gatb-pipeline a FOF of all libraries in any order. This will produce contigs but no scaffold. Note that these will be the same contigs as if you had specified that librairies were paired/mate-pairs, as Minia does not care about pairing when making contigs. Then you can run BESST stand-alone (using the gatb-pipeline script by using the -c argument, or just the BESST program, or any other scaffolder) manually using the contigs produced by gatb-pipeline.

harish0201 commented 4 years ago

Thank you for the suggestion and apologies for the delayed response!

I did use a mixture of the two options though. I concatenated the smaller insert fastqs and used the mate-pair separately.

rchikhi commented 4 years ago

sounds good, did it work?

harish0201 commented 4 years ago

Yup, it did! I got a decent contiguity as well.

Got the 2.5Gb genome in 9876 (no joke) scaffolds over 1kb in length with an N50 of 1.83Mb. Having a 8Kb and 20Kb insert MP libraries did help a lot combined with dual scaffolding.

rchikhi commented 4 years ago

very nice!