choice of amount of Illumina data

Generade-nl / EelSeeds

Scripts used to extract seeds for the European eel genome assembly

3 stars 1 forks source link

Hello,

I am preparing the short read data with these two scripts, and I wonder if you have some guidelines to help the choice. I am assembling a 5 Gb genome, I have about 30x cov of 1x260 bp reads (I took only one end so that I do not need to merge them - if it is better, I can merge the pair to get ~460 bp Flashed reads), and I got my repeat threshold from the Jellysifh histo - I chose 58 (k-mer size 25). The high peak is the het, the lower at ~60x is the homo. Capture

I edited the binLongEelReads.perl to extract sequences in the range 230-245: should I stay as high as I can? How important is this length value? Is it worth to use 460 bp reads? But with longer reads, more may contain high-copy k-mers. I am also wondering if I should parse all the 30x to get the sequences to align to my long reads, or if there is a threshold I can stop at. Thanks,

Dario

$ perl ./binLongEelReads_250-265.perl merged_reads_200_58.fa length 250 10606826 length 255 10595691 length 260 10581447 length 265 7024869 Warning: unable to close filehandle properly: Bad file descriptor during global destruction. $perl -v This is perl 5, version 22, subversion 0 (v5.22.0) built for x86_64-linux-thread-multi

Generade-nl / EelSeeds

choice of amount of Illumina data #3