Open jennomics opened 8 years ago
That's odd! I can confirm that MetaPalette doesn't have this issue when using 64G of RAM, so it must be that 50G of RAM is not quite enough for the bacteria pre-trained database.
One thing you can try is using the "comparison" database (linked to in the paper) as this is a smaller database (and contains a mix of different organisms from different kingdoms). It's not as complete as the other databases, but may be sufficient for your purposes.
When I have time, I plan on addressing this (see: this issue).
I'm having similar problems. I've taken a look ate the RAM memory usage using htop and it does not use one Gb so 50Gb must be enough.
I'm using the provided pre-trained data, with a test sample that has 2 million reads. It's a soil sample, so I would expect it to be diverse. I'm on a c4.8xlarge AWS instance, with 500G of disk space, 60G of memory and 36 processors.
python Classify.py -d Metapalette/Bacteria -o Metapalette_out_Bacteria -i testdata.fastq -Q C -k sensitive -j jellyfish -q /software/MetaPalette/src/QueryPerSeq/query_per_sequence -t 30 -n
Traceback (most recent call last): File "Classify.py", line 156, in
x = ClassifyPackage.Classify(training_file_names, CKM_matrices, Y_norms, cutoff)
File "software/MetaPalette/src/Python/ClassifyPackage.py", line 34, in Classify
A_with_hypothetical = np.concatenate(A_with_hypotheticals, axis=0)
MemoryError
Can you suggest a solution? Thank you!