Memory Error For TWITTER-US Dataset

MortonWang commented 5 years ago

Hi, Tiiiger

I am very interesting in your recent SGC work.

I want to apply your SGC code to the Semi-supervised user geolocation which belongs to Downstream Tasks in your paper.

The GEOTEXT dataset is OK, but when I turn to TWITTER-US and TWITTER-WORLD, it crashed.The error is list as follows:

File "/home/wtl/桌面/wtlCode/geoSGC/dataProcess.py", line 96, in process_data features = torch.FloatTensor(features.to_dense()) RuntimeError: $ Torch: not enough memory: you tried to allocate 417GB. Buy new RAM! at /pytorch/aten/src/TH/THGeneral.cpp:201

I have tried different versions of python and torch, such as python 2.7 + torch 1.0.1.post2 and python 3.5 + torch 1.0.1.post2 , but failed. I also google for solution but so many methods dose not work.

Do you have the similar error, and how do you fix it? My computer is Ubuntu 16.04 with 40GB memory.

Many thanks for your help.

-Morton

Tiiiger commented 5 years ago

@felixgwu

felixgwu commented 5 years ago

Hi Morton,

The reason is that the TWITTER-US and TWITTER-WORLD have high-dimensional sparse features. Converting these features to a dense tensor would require too much memory. In our experiments, we keep it as a sparse Tensor in our experiments. We use Afshin's code base written in Theano, so we didn't implement it in PyTorch; however, you may consider converting the input features to a torch.sparse.FloatTensor. BTW, we don't pre-compute the features on these datasets. Instead, based on the associative property we multiply the node features with the weight matrix first and then do K-step propagation to reduce memory usage.

-Felix

Tiiiger commented 5 years ago

@MortonWang I am closing this if you don't have any further question. feel free to reopen.

Tiiiger / SGC

Memory Error For TWITTER-US Dataset #9