audy / stitch

Overlap assembler of paired-end DNA sequences generated by Illumina
MIT License
22 stars 7 forks source link

size error #3

Closed CaraFiore closed 12 years ago

CaraFiore commented 12 years ago

Hello, I just tried to use stitch.py to merge two fastq files and got the error below, I was wondering if you could help me sort it out. I am a novice to much of this. Thank you.

python stitch.py -i ~/FLxm1_R1.PF.fastq -j ~/FLxm1_R2.PF.fastq -o FLxm_merged

Exception in thread Thread-2: Traceback (most recent call last): File "/bioware/python-2.7.2/lib/python2.7/threading.py", line 552, in *bootstrap_inner self.run() File "/bioware/python-2.7.2/lib/python2.7/threading.py", line 505, in run self.__target(_self.__args, _self.kwargs) File "/bioware/python-2.7.2/lib/python2.7/multiprocessing/pool.py", line 308, in _handle_tasks for i, task in enumerate(taskseq): File "/bioware/python-2.7.2/lib/python2.7/multiprocessing/pool.py", line 234, in self._taskqueue.put((((result._job, i, func, (x,), {}) File "/automounts/class/class/cfiore/audy-stitch-a95a24d/stitch/fasta.py", line 16, in iter** for line in self.handle: SystemError: Negative size passed to PyString_FromStringAndSize

audy commented 12 years ago

Can you send me part of your fastq files so I can duplicate the error? You can paste them as a gist.

CaraFiore commented 12 years ago

Hi, here you go, this is the first 10 lines from one of the fastq files (I think this is the way you wanted it, but if you want me to paste it directly on the website let me know), if you need more just let me know.

Thank you!

On Wed, Aug 22, 2012 at 12:18 PM, Austin Richardson < notifications@github.com> wrote:

Can you send me part of your fastq files so I can duplicate the error? You can paste them as a gist.

— Reply to this email directly or view it on GitHubhttps://github.com/audy/stitch/issues/3#issuecomment-7939805.

audy commented 12 years ago

@pachaboo I'm not seeing any fastq stuff. Also, I need some lines from both fastq files if I am to try to recreate your bug.

CaraFiore commented 12 years ago

Ok I will paste here 50 lines from each:

First 50 lines LCxm1_R1.fastq:

@HWI-ST330:173:D0M78ACXX:5:1101:1568:1967 1:N:0:GAGTGG
NCCGCGAGCCCTCTGCCTTTGGTTCTACATGCTTAGCCGTTTCTTTAGTTTAACCGCCTGAGATCGGAAGAGCACACGTCTGAACTCCAGTCACGAGTGG
+
#1:BDD@DHHHDFHIIHIGHGIGIIHDG*BGGGI@EGFF?FHDHII.B@FH=FHIGFHFF;?7@CCD;,9?AA@A,9=ABCB4:@CCC4>>C:@@8>8<B
@HWI-ST330:173:D0M78ACXX:5:1101:2373:1968 1:N:0:GAGTGG
NACTGCTCGTAAGTCTACTGGAGGAAAAGCTCCCCGTAAACAACTGGCAACTAAGGCAGCAAGGAAGAGTGCTCCATCAACTGGTGGTGTCAAGAAGCCT
+
#1BDDFFFHHHHHGIJJJJJJJJJIJJJGGIJJJJIGHJJIIJIIJJJJJJJJJJIGJJHHHHFED@BD(;ACDD@CCCDDDDDCCD8??CCDDDDDCBD
@HWI-ST330:173:D0M78ACXX:5:1101:3245:1999 1:N:0:GAGTGG
NTCCGCGGTACGGCTGAGTTCGCTCGGTGCGTGCTGAGGAAAGTCCGGGCTCCCCCATGGCCAGGCCTGCTGGGGAACGCCCAGTGCGGGGGACCGTGAG
+
#1=BDDFD@DHHFIIIIIIIIIIIIII(BAG@FHIGCHIHHEF7@DDCDBBA@BDDD?CCB?A?@<@>BB@A@?&(0(95<B58(4><>B&5)5?9&88?
@HWI-ST330:173:D0M78ACXX:5:1101:3389:1966 1:N:0:GAGTGG
NAAGAACCCCGGGAGGGGAGTGAAACAGAACATGAAACCGTGAGCCTACAAGCAATGGGAGCCCGACTGATCGGGTGACTGTGTGCCTGTTGAAGAATGA
+
#4=DFFFFHHHHHHIJJJGIHIJIIIJIIJIJJIJJJJII@EEGFHFFFFFECEEEDDDDDBBBDDDDDDDDDDDB<<CDCCC>ACDDDCCCDCDDDDCD
@HWI-ST330:173:D0M78ACXX:5:1101:3420:1968 1:N:0:GAGTGG
NATTGATCCAATGGATTCCTTTTAAAGAAGCTCTGGTAGTTTTTAACAGTGATGGTCTTCATCCATTGGTGAAATCAGCCTATTTGGATTTTGTTACATC
+
#1=DDDFFHHHHHJJIJIJJJJJJIJJJJJJJJJJJFGHGIIJIIGIHIFIIJJJFHIIJIJJJIIJJJFHIJJJEGIJHFFFHHFFFFFFEEEEDDDED
@HWI-ST330:173:D0M78ACXX:5:1101:3491:1983 1:N:0:GAGTGG
NGGTTATTGAATATCAACCCAAATCGCTGCAAACTATGGTCGAAAAAAAATTACGTGAGTTACAGGTGAATGGTTTGAAAGTTTATGGTCTCACTGAAAG
+
#4=DDFFFHHHHHIJJJJJJJJJJJJJJJJJJIJJJIJJBGHIJJJJJIHHFHHFEFBBCEEEEDD@CDDDDDCDBBADDC@CC>CCCACDDECCCDDDD
@HWI-ST330:173:D0M78ACXX:5:1101:3974:1981 1:N:0:GAGTGG
NGTAAAATATCTGGATATTGCTATAATGAGGTTTATGGAAAAATTCACTTTTGATTAATGTTTATCGGTGTAAATTTAACTTTCTTCCCTCAACACTTTT
+
#1:BDFFFGHHFHJIJGIIIIIGIJJJIJJJFGIIJJIJJIJJJJJJJJJJJJJJJJJJIJJJJJJJIHGHGIJIHHIJHHHHHHFFFFFFEEECDDDDD
@HWI-ST330:173:D0M78ACXX:5:1101:4534:1968 1:N:0:GAGTGG
NCACCGTCTACGGCGGCCCTTCCCAGGGATCCTTCGGCTAGAAATGGAGTTTGTAACTCCTCGAGGCCAGATGAGGCGACCTCTGAAACGTCCTGTAACC
+
#1=DDFFFHHHHGJGI>HIIIIJJBHHIEFEICGGEHFDDDACE@;@;=5>ACA>A;>>A>A8;;B;??BB@AAAC<>-9@&:::>C@A<2<?044:>@>
@HWI-ST330:173:D0M78ACXX:5:1101:4554:1987 1:N:0:GAGTGG
NGACTAACCGCTGTAGGATGGTGTGTGCTACTACTGTCTTGTCTGACAGGTTGAGGAGGGGGGGTGGCACCTATCTCCATACTCTGTTGGCTGCTTGCTT
+
#4BDFFFFHHHHHHIJJIJJJHGHHGHIJJJJIJJIIIIJ>@BB@?FCDH9E>AHI@FHIHF&558>13:?>>@@>@C>@@CCC>>:@:?:3<<:A:8>A
@HWI-ST330:173:D0M78ACXX:5:1101:4665:1998 1:N:0:GAGTGG
NCTCACCCATGATTGCGGCCATCGGTATCGTCGGGTCGCCCAGGTCCATAGATCGGAAGAGCACACGTCTGAACTCCAGTCACGAGTGGATATCGTATTC
+
#1:ADDDBFFFFDFF<@GFGFFFFF?C6B7BBEFI27';@AA;;..;?B;>A96('(5;@@5>@B###################################
@HWI-ST330:173:D0M78ACXX:5:1101:4804:1971 1:N:0:GAGTGG
NATCCCACCAAAAAAATTAGTATATAAAACCTTAGTAATCTTTAACTTAAAGCGCACATGCGCTGGTGCACACAGATCCTATCAACGTTTTATTCTCAAA
+
#1=DFFFFHHHHHJJJJJJJHIIJJJJJJJJJJIJHIJJJJJJJJJJJJJJJJJJJJIHHHHFFDDEDDDDDDDDDDDDDDDDDDDD<CDDDDDEEDEDC
@HWI-ST330:173:D0M78ACXX:5:1101:5019:1975 1:N:0:GAGTGG
NACGCATTTCACCGCTACACCGGAAATTCCCTCTGCCCCTGCCACACTCAAGCCTAGCAGTTTCCATTGCAGTGATGGAGTTGAGCTCCACGCTTTAACA
+
#1=DDDDDHFDAFBHIIHEDBGG@FFHI@DFEFG3??FDHAHGGEEGGE@@DGFHFFECDEEEE>>BDCC@@CCA>@35?CCC:>:ACAC9?9@BB34>:
@HWI-ST330:173:D0M78ACXX:5:1101:5559:1985 1:N:0:GAGTGG
NAGCAATGGTCACTAGAATAAACACCACTGTGGTATAGCTGTAAAGAAACATCACTTGCATCAATAGCTACCATCACAGGTTGGTTAGCAACAGCATAGG

First 50 lines from LCxm1_R2.fastq:

@HWI-ST330:173:D0M78ACXX:5:1101:1568:1967 2:N:0:GAGTGG
CAGGCGGTTAAACTAAAGAAACGGCTAAGCATGTAGAACCAAAGGCAGAGGGCTCGCGGGAGATCGGAAGAGCGTCGTGTAGGGAAAGAGGGGAGATCTC
+
@@@FFFFFHFHFHIEHJIJIGIGHIJHHEG@DHHGG?FHID8BFHIAE;DCHHFFDDDDD0;&9>?B((52<AB0?########################
@HWI-ST330:173:D0M78ACXX:5:1101:2373:1968 2:N:0:GAGTGG
GCACTTTGGAACCTCAGATCAGTTTTAAAGTCTTGAGCAATTTCACGAACCAACCTTTGAAAAGGGAGTTTACGGATTAGAAGTTCAGTGGACTTTTGAT
+
CCCFFFFFHHHHHJJIJHJJJJHIIJJJIIFHIIIIJJJJIIIIIJJJGHHIIJJIJGIIHIJFIJIH;ACDFFFACDDDDDDACEEDCDDC@CDCDDBD
@HWI-ST330:173:D0M78ACXX:5:1101:3245:1999 2:N:0:GAGTGG
GCCGGGTTCTGTTCCTGGGGAGGTCTCCATGGAATGGCCCCCTCTCCAGGGGCGACCATCTCTCTAGGACCTGTGTTACCACAGGCCTCAAGCGGCATGT
+
@CCFFF@DDHDFHIJEIJGGFGIAHIJIHCHJJJGGGBECGG@;FGHGIIIIBCBDDBCDCCCDDCCCACB9?AC:?CCCDD@BBBBCAC:(:<9555A:
@HWI-ST330:173:D0M78ACXX:5:1101:3389:1966 2:N:0:GAGTGG
GCCCTATTCAGACTCGCTTTCGCTGTGGCTCCGAGATTGCCCTCTTAACCTGCCAGTGCCTATAAGTCGCCGGCTCATTCTTCAACAGGCACACAGTCAC
+
CCCFFFFFHHHHHJJJJJJJJJJJJGIJJJJJG>GGIIDFIIEHEGFHIIGHIIIIHIJHGHHGHFFFFCBBDDDBCCDDDECAACC?BBDDDDDD>CAA
@HWI-ST330:173:D0M78ACXX:5:1101:3420:1968 2:N:0:GAGTGG
GCATGATCAAGACCAGTTACTAAGTTCTTTGTCACATTAGCTGCTGCTAATGGATTAGGAGATAACTTCTCCCATAGAAAACATCGCCACAAATTATCAA
+
CCCFFFFFHHHHHJJJJJIJIJJJJJJJJJJIJDGIIJJJJJJIJJJIIIGHIIIJEIJIJIJJJJJJJJJJJIJJIIIIHHGFFFFFDDBBDDDDDDDD
@HWI-ST330:173:D0M78ACXX:5:1101:3491:1983 2:N:0:GAGTGG
AGCTGGTCCACCATGCTTCAGATTCTTTTTTATAAATTGTTTGAATCTTGCTCTGTCATTGTTGACTATTTTGTAACTTATTTCATATTCTTTAGCAATC
+
CCCFFFFFHHHHHJJJJJJJJJJJHIJJJJJJJIJJJJIHIHGGIIIGIGGIIJJJJJJIIIIGGIJJJJJJGHIIIGHGGHHGFFFFFFFFECCCEEDD
@HWI-ST330:173:D0M78ACXX:5:1101:3974:1981 2:N:0:GAGTGG
CCGCCGAGGTAAACCAGCTAATCCTAAAAAGTGTTGAGGGAAGAAAGTTAAATTTACACCGATAAACATTAATCAAAAGTGAATTTTTCCATAAACCTCA
+
BCCFFFFFHHHFHJIIJJJJJGJJIJJJJJJDGBHEFHHIEHIIIIF@@AGGIJIDHJHHHFFCDDCDCDDDEDDDDDA5>CDDDDDDCCDDDDCCCCC@
@HWI-ST330:173:D0M78ACXX:5:1101:4534:1968 2:N:0:GAGTGG
GGAACTGAAACATCTAAGTACCCCGAGGAAAAGAAATCAACCGAGATTCCCTTAGTAGTGGCGAACGAAAGGGGAAGAGCCTAAACCGCAACGACCTAAC
+
CCCFFFFFHHGGHJJIJJIHIJJJJEHJIIJJIJJGIGGIFJIIJJGJJJHIJJIHCHHHHHFFDDDDDDDDDDB@BBD?BCDDDDDBD>BBD<>B5ACB
@HWI-ST330:173:D0M78ACXX:5:1101:4554:1987 2:N:0:GAGTGG
CAAGATTCTCAAGCTATTCTTGCTCACCAACAAAGTCAAGCAAGCAGCCAACAGAGTATGGAGATAGGTGCCACACCCCCTCCTCAACCTGTCAGACAAG
+
CCCFFFFFHHHHHJJJJJJJJJJJJJJJJJJJJJJHIJJJJIJJIJJJJIJIJJJJFHIJJJJJJJJJEHIJHHGHFFDDDDDDDDDDDCDDDDDDDDDD
@HWI-ST330:173:D0M78ACXX:5:1101:4665:1998 2:N:0:GAGTGG
ATGGACCTGGGCGACCCGACGATACCGATGGCCGCAATCATGGGTGAGAAGATCGGAAGAGCGTCGTGTAGGGGAAGAGTGTAGATCTCGGTGGGCGCCG
+
@@<DDFDFFBDDDHIFDGIGD@6BBCFGGIGAH;:AD6@>>>@A(,;?A;?A:><?(8;?A9;&2;08(+:@B&00(28+4+(4:::@3<??########
@HWI-ST330:173:D0M78ACXX:5:1101:4804:1971 2:N:0:GAGTGG
CTACGATAATAAAAATAATCACTGAAAGCATCTCAAGTGAGAAGTTAAATATTAATCTATGTTTCTAAAAACCGTTTGAGAATAAAACGTTGATAGGATC
+
CCCFFFFFHHHHHJJJJJJJJJJJJJJJIIIJJJJIIGIIJIIIGGIJJJJJJJJJJJJJJJJJJJIIEHIJIGHJGHHHFFFFFFEEDACDDDEDDDDD
@HWI-ST330:173:D0M78ACXX:5:1101:5019:1975 2:N:0:GAGTGG
ATTCCGTGCCAGCAGCCGCGGTAATACGGGAGTGGCAAGCGTTATCCGGAATTATTGGGCGTAAAGCGTCCGCAGGCGGCCAAGCAAGTCTGTTGTTAAA
+
@BCDDD3AFDFFBF;CBGEH8?HH?GG;FEB0((((8CC2;(5=6?=5=D=;;A;;;>A@:3=@?A:0(+)09B&0&05&59<8<A9(+4:>>CC??>@:
@HWI-ST330:173:D0M78ACXX:5:1101:5559:1985 2:N:0:GAGTGG
ATGAGTGGCAATGAGACTGCTCTAGCCTATGCTGTTGCTAACCAACCTGTGATGGTAGCTATTGATGCAAGTGATGTTTCTTTACAGCTATACCACAGTG
audy commented 12 years ago

Works for me. Does the error happen when you try it?

Also, if you're going to test on a subset of your data then the number of lines needs to be a multiple of 4.

I made a gist of the fastq files I am using: https://gist.github.com/3429035

CaraFiore commented 12 years ago

Sorry for the delay, so yes, I don't know what happened before but now it seems to work and I didn't get the same error(? wierd). I do have a question for you though, this and other merge scripts are based on overlapping reads correct? So what if the reads don't overlap, for example, the average insert size for my library was 240 nt and the reads are 100 nt each, so on average there would be a gap of 40 nt, right? Thank you for your help.

audy commented 12 years ago

In the case when reads don't overlap, they are saved to the left and right singletons file.