dkoslicki / MetaPalette

Metagenomic profiling and phylogenetic distances via common kmers
Other
42 stars 5 forks source link

Memory error #7

Open jennomics opened 8 years ago

jennomics commented 8 years ago

I'm using the provided pre-trained data, with a test sample that has 2 million reads. It's a soil sample, so I would expect it to be diverse. I'm on a c4.8xlarge AWS instance, with 500G of disk space, 60G of memory and 36 processors.

python Classify.py -d Metapalette/Bacteria -o Metapalette_out_Bacteria -i testdata.fastq -Q C -k sensitive -j jellyfish -q /software/MetaPalette/src/QueryPerSeq/query_per_sequence -t 30 -n

Traceback (most recent call last): File "Classify.py", line 156, in x = ClassifyPackage.Classify(training_file_names, CKM_matrices, Y_norms, cutoff) File "software/MetaPalette/src/Python/ClassifyPackage.py", line 34, in Classify A_with_hypothetical = np.concatenate(A_with_hypotheticals, axis=0) MemoryError

Can you suggest a solution? Thank you!

dkoslicki commented 8 years ago

That's odd! I can confirm that MetaPalette doesn't have this issue when using 64G of RAM, so it must be that 50G of RAM is not quite enough for the bacteria pre-trained database.

One thing you can try is using the "comparison" database (linked to in the paper) as this is a smaller database (and contains a mix of different organisms from different kingdoms). It's not as complete as the other databases, but may be sufficient for your purposes.

When I have time, I plan on addressing this (see: this issue).

rob2go commented 8 years ago

I'm having similar problems. I've taken a look ate the RAM memory usage using htop and it does not use one Gb so 50Gb must be enough.