Open daemon opened 6 years ago
For now this is something specific to CharCNN due to the large size of the character quantized matrices. But in general I feel it's better to have a streaming approach to loading the dataset from disk and preprocessing it rather than caching all of it in memory.
@achyudhk , I think after 10th Dec, we can fix this issue, given that we have been using the repo for quite a while now. SG? @daemon , can you assign us to the issue? Coz even HAN was facing a similar issue but not as alarming as CharCNN perhaps.
@achyudhk reports that CharCNN on some dataset uses 63GB of RAM (Hydra and Dragon both have 64GB). I think a solution would be some mechanism for moving data between disk and RAM when needed?