garydoranjr / misvm

Multiple-Instance Support Vector Machines
BSD 3-Clause "New" or "Revised" License
234 stars 81 forks source link

memory error on medium scale #6

Closed anuragkr90 closed 8 years ago

anuragkr90 commented 8 years ago

Hi, I am trying to run missSVM and MICA but I am getting memory error. The total number of instances across all training bags is about 120,000 with dimensions 100. Is there a way to get it running on a 16Gb RAM computer ?

garydoranjr commented 8 years ago

Hello,

Kernel methods require computing an n-by-n kernel matrix between n training examples. With 120,000 examples, this is a matrix with 14.4 billion entries, which is ~115 GB if each entry is represented with a double-precision (8-byte) floating point number. If the kernel matrix is sparse, you might get away with using a sparse matrix representation, but that is not implemented here. There are also other optimization procedures (again, not implemented here unfortunately) that don't require computing the full kernel matrix in memory, but then you trade-off memory for CPU and it might take significantly longer to train the model.

Another approach I suggest (if you are ultimately interested in bag-level labels) is to use a bag-level classifier such as the MI-Kernel method. This only requires computing a matrix that is O(n_bags^2) instead of O(n_instances^2) and actually tends to have better performance on the bag-labeling task.