brentp / methylcode

Alignment and Tabulation of BiSulfite Treated Reads
Other
16 stars 7 forks source link

MemoryError even on two read-fastqs #3

Closed jovialj closed 11 years ago

jovialj commented 13 years ago

Hi I am trying to run methylcoder using bowtie but I am running into errors at the same point. I tried running it with fastq files having just two reads to see if it was memory intensive, but it fails with the same error msg. The error msg reads:

methylcoder --bowtie=/usr/local/bowtie-0.12.7/ --outdir=METHYL_TEST/ --extra-args '--threads 4' --reference=METHYL_REF/REF1.fasta R1.fastq R2.fastq

using METHYL_TEST for writing output

^ not running: /usr/local/bowtie-0.12.7/bowtie-build -q -f METHYL_REF/bowtie_index/REF1.fr.c2t.fasta /METHYL_REF/bowtie_index/REF1.fr.c2t > METHYL_REF/bowtie_index/REF1.fr.c2t.bowtie-build.log ^

converting C to T in METHYL_TEST/R1.fastq converting G to A in METHYL_TEST/R2.fastq opening index opening index

tabulating methylation for METHYL_REF/REF1.fasta ERROR: don't use .bin or text files Traceback (most recent call last): File "/usr/local/bin/methylcoder", line 9, in load_entry_point('methylcoder==0.0', 'console_scripts', 'methylcoder')() File "/usr/local/lib/python2.6/dist-packages/methylcoder-0.0-py2.6-linux-x86_64.egg/methylcoder/init.py", line 753, in main counts, unmatched = count_conversions(fasta, sam_iter, read_paths, c2t_reads_list, IndexClass, opts.out_dir, opts.mismatches, is_colorspace=is_colorspace) File "/usr/local/lib/python2.6/dist-packages/methylcoder-0.0-py2.6-linux-x86_64.egg/methylcoder/init.py", line 420, in count_conversions counts = get_counts(fc, ft, fa) File "/usr/local/lib/python2.6/dist-packages/methylcoder-0.0-py2.6-linux-x86_64.egg/methylcoder/init.py", line 376, in get_counts tc = {'t': np.zeros((len(seq),), dtype=np.uint32), MemoryError

brentp commented 13 years ago

How much memory does the machine you're running this on have? What is the size of your reference?

You may need to use a machine with more memory. Though I have no problem running with 8GB RAM.

jovialj commented 13 years ago

4GB. Since I am using the Human Reference, I have split the reference in two . This one is 1.6GB from chr 1- 9. Btw, the process does print out the following files:

chr.lengths.txt cmd.ran commands.sam.sh methylcoded.sam (size 0)

jovialj commented 13 years ago

Also when I ran Bowtie as a standalone software feeding in the files from methylcoder, it complains that it couldnt find the reverse index files (.rev.1.ebwt & .rev.2.ebwt) which it shouldn't since I am using the --norc options (as you have it in the init.py script) :

I saw also that your bowtie-build script just builds the index for the forward strand since you use the '-f' option so there shouldn't be reverse index files anyways.

Here is the error

/usr/local/bowtie-0.12.7/bowtie --fullref --sam --chunkmbs 512 --norc -p 4 METHYL_CODER_REF/bowtie_index/ref1.fr.c2t -1 M5MT_1.fastq.c2t -2 M5MT_2.fastq.c2t test.sam Could not open index file METHYL_CODER_REF/bowtie_index/ref1.fr.c2t.rev.1.ebwt Could not open index file METHYL_CODER_REF/bowtie_index/ref1.fr.c2t.rev.2.ebwt Segmentation fault

brentp commented 13 years ago

Yes, you'll have better luck with 16GB.

It finished mapping. In the final step, to do the tabulation, it creates 3 arrays for each chromosome, with a length corresponding to the length of the chromosome. Originally, I had it using an mmap array, but figured everyone would have enough memory to just put the entire thing into memory to improve speed....

The simplest solution is to use a machine with more memory.

brentp commented 13 years ago

I dont know what is wrong with your command off-hand, but you are sending a .sam file to -2 that can't work.

jovialj commented 13 years ago

But what about the bowtie error when it is run in standalone mode? Also the SAM file is empty , so I am not sure if it is safe to assume that it has finished mapping.

jovialj commented 13 years ago

But what about the bowtie error when it is run in standalone mode? Also the SAM file is empty , so I am not sure if it is safe to assume that it has finished mapping.

jovialj commented 13 years ago

No, I am sending the M5MT_2.fastq.c2t to -2.

Also I ran it again without any .sam. It fails immediately since it cannot find the index.

/usr/local/bowtie-0.12.7/bowtie --fullref --sam --chunkmbs 512 --norc -p 4 ../METHYL_REF/bowtie_index/REF1.fr.c2t -1 R1.fastq.c2t -2 R2.fastq.c2t Could not open index file ../METHYL_REF/bowtie_index/REF1.fr.c2t.rev.1.ebwt Could not open index file ../METHYL_REF/bowtie_index/REF1.fr.c2t.rev.2.ebwt Segmentation fault

brentp commented 13 years ago

You should have the .rev files. The -f flag to bowtie-build indicates that the reference is a fasta file.

jovialj commented 13 years ago

Thanks! But bowtie-build still doesn't create any .rev files and it jumps to the next step on coverting read fastqs from c2t.

brentp commented 13 years ago

You'll have to rebuild the index. To do so, do "rm /METHYL_REF/bowtie_index/REF1.fr.c2t.*"

jovialj commented 13 years ago

Did it thrice already on three different directories. Looks like it is a bowtie memory issue then. Will probably run on high-memory machine and let you know if it works. Thanks!

jovialj commented 13 years ago

Hi Brent,

So I aligned the reads on a higher memory machine, but looks almost nothing aligned :

reads processed: 36043042 reads with at least one reported alignment: 250352 (0.69%) reads that failed to align: 35792690 (99.31%)

Is there any parameter that you suggest tinkering with?

Thanks

brentp commented 13 years ago

can you share 100K reads that you are having trouble with? Are you sure these are bisulfite treated reads? What genome are you aligning to?