apfejes / epigenetics-software

This repository contains code for epigenetic analysis, including a chip-seq/chip-chip tool, a database and a web server.
7 stars 1 forks source link

Database takes up a disproportionally large space for methylation data. #16

Closed apfejes closed 10 years ago

apfejes commented 10 years ago

Shrink down the names of the fields in the methylation table.

eg b-value -> b probeid -> pid sampleid ->sid

apfejes commented 10 years ago

Note that sampleids should all be converted to objectid(sampleid), as this takes up half the space (objectid uses a bitearray) and shrinks the corresponding indices as well.

apfejes commented 10 years ago

must run the following command on the db:

db.methylation.find().forEach(function(doc) { db.methylation.update({_id:doc._id},{$set:{sampleid:new ObjectId(doc.sampleid)}}); });
apfejes commented 10 years ago

Database was updated and compacted, but mongodb does not release extra space, so it is difficult to explicitly tell what the net effect was. However, it's likely worth trying to load another data set to see how it progresses.