Closed harish0201 closed 4 years ago
Hi! Apologies for the delayed answer. Indeed the FOF strategy doesn't work as is.
You'll need to
1) if applicable, for each insert size, concatenate all the left (resp. right) files from that insert size into a single left (resp. right) file
2) pass as argument each insert size library, e.g. -1 left_insertA.fq.gz -2 right_insertA.gz --mp-1 left_insertB.fq.gz --mp-2 right_insertB.fq.gz
in order from smaller insert size to larger (ie in my example, insert size A is smaller than B)
Alternatively:
1) run gatb-pipeline a FOF of all libraries in any order. This will produce contigs but no scaffold. Note that these will be the same contigs as if you had specified that librairies were paired/mate-pairs, as Minia does not care about pairing when making contigs. Then you can run BESST stand-alone (using the gatb-pipeline script by using the -c
argument, or just the BESST program, or any other scaffolder) manually using the contigs produced by gatb-pipeline.
Thank you for the suggestion and apologies for the delayed response!
I did use a mixture of the two options though. I concatenated the smaller insert fastqs and used the mate-pair separately.
sounds good, did it work?
Yup, it did! I got a decent contiguity as well.
Got the 2.5Gb genome in 9876 (no joke) scaffolds over 1kb in length with an N50 of 1.83Mb. Having a 8Kb and 20Kb insert MP libraries did help a lot combined with dual scaffolding.
very nice!
Hi!
I have a couple of older Illumina datasets (both PE and MP) split across multiple insert sizes and libraries.
Is it possible to pass them as a single argument, as I think that'd make life easy.
Would a FIFO sort of approach work? Or should I give a file of files (FOF) ? I believe that the FOF approach works only for SE-Reads rather than PE.
If needed I can pass MP libraries later to scaffold only.
What would you suggest?