Open coreywischmeyer opened 10 years ago
Hi Corey, I need some more information. Which version of seq_crumbs are you using? We haven't made a new release in a long time and there are a lot of bugfixes in HEAD.
I am trying to reproduce the error with a test file and HEADr and I can't. Could you send me the exact command used and the input file?
Thanks
I got the bug out of the source for seq_crumbs 0.1.8 as well as the binary available from the COMAV site (also 0.1.8). Sadly the data that I'm using cannot be sent along. If you aren't getting the bug it maybe a problem with my python setup, I did have some trouble getting it to work.
-Corey
Hi Corey, I think the bug it is solved in github master repository. Could you test it? p.
I don't think the bug is solved. I am using version seq_crumbs-0.1.8-x64-linux. The command I gave was:
pair_matcher -o pairs.1.454.fastq -p orphan_1.454.fastq 1.454Reads.qual.fastq
The input file size for 1.454Reads.qual.fastq was 327.9 MiB The file size for orphan_1.454.fastq was 97.0 GiB, at the time my account ran out of memory (I was doing four files at the same time, and all produced similar orphan file sizes).
The issue is that the same read appears to be printing multiple times in the orphan file:
grep -c '@HD6LUZQ01AJRPA' orphan_1.454.fastq 27237
Hi Binzo. Are you using githubs master branch? peio
Not totally sure- I may have gotten it from http://bioinf.comav.upv.es/ I can't remember
Not sure if the file name can deduce this for you, but the file I downloaded was "seq_crumbs-0.1.8-x64-linux.tar.gz"
I'll reinstall the version on github and let you know if the problem goes away.
Lindsay
On 10/17/14, Peio wrote:
Hi Binzo. Are you using githubs master branch? peio
— Reply to this email directly or view it on GitHub(https://github.com/JoseBlanca/seq_crumbs/issues/10#issuecomment-59475399).
I'm attempting to use seq_crumbs for a pipeline to take sff files and convert them into adaptor trimmed fastq files and I'm finding that pair matcher is making files larger than the original. One run I had to stop because the orphan file was 30gigs.
I ran it again to illustrate the error:
Also when I do a:
grep ^@test | sort | uniq -c
I am finding that some reads are being written to the orphan file thousands of times.This bug appears (for me) in both the binary and the source versions.