Open dcopetti opened 5 years ago
Hello, I went ahead with running the steps and from the second perl script I got this line:
$ perl ./binLongEelReads_250-265.perl merged_reads_200_58.fa
length 250 10606826
length 255 10595691
length 260 10581447
length 265 7024869
Warning: unable to close filehandle properly: Bad file descriptor during global destruction.
$perl -v
This is perl 5, version 22, subversion 0 (v5.22.0) built for x86_64-linux-thread-multi
I looked at the tail of the fastas, and the formatting looks fine. I wonder if I should disregard the error or e.g. the files may be incomplete. Thanks, Dario
Hello,
I am preparing the short read data with these two scripts, and I wonder if you have some guidelines to help the choice. I am assembling a 5 Gb genome, I have about 30x cov of 1x260 bp reads (I took only one end so that I do not need to merge them - if it is better, I can merge the pair to get ~460 bp Flashed reads), and I got my repeat threshold from the Jellysifh histo - I chose 58 (k-mer size 25). The high peak is the het, the lower at ~60x is the homo.
I edited the binLongEelReads.perl to extract sequences in the range 230-245: should I stay as high as I can? How important is this length value? Is it worth to use 460 bp reads? But with longer reads, more may contain high-copy k-mers. I am also wondering if I should parse all the 30x to get the sequences to align to my long reads, or if there is a threshold I can stop at. Thanks,
Dario