Questions about the project

afshinrahimi / geographconv

Semi-supervised User Geolocation via Graph Convolutional Networks

66 stars 18 forks source link

Questions about the project #2

Open CyprienGottstein opened 5 years ago

CyprienGottstein commented 5 years ago

Hello,

I am interested in your work about the Twitter graph dataset and i'd like to do the same thing on my own side.

I need to know beforehand, how long did it took to run the whole process and on what kind of hardware ? I am not looking for performance, I just want an overall estimation about the needed time to compute in order to organize my upcoming work (and also it would help me detect a problem if it runs for too long).

Also the Twitter graph dataset is quite big so I'm being cautious, how much RAM will I need to make this run ?

Thank you for putting this into github, I mean it.

Best regards

afshinrahimi commented 5 years ago

Hi,

Memory requirement: small for GeoText (you can run it on any laptop) big for the other two (you'll need about 30GB RAM).

Run Time: few minutes for GeoText, 3 days for the other two.

So overall for the smallest dataset you can run it on a regular laptop. For the big ones you definitely need a server (GPU doesn't matter, these are all to be run on CPU).

Best wishes

CyprienGottstein commented 5 years ago

Hi,

Thank you for your answer !

I managed to run the small dataset but I'm struggling to understand.

I opened the "gc_1...pkl" file with Pickle in a little python script but I am not exactly sure about how you decided to store the data.

I also tried to save the model using the "-save" options and opened it, again using python pickle, it was stored into ./data/model...pkl but i don't see what I am supposed to do with that.

The idea is that, I would like to have a single aggregated dataset with your predictions applied on the data. It doesn't matter to me if there are potential errors, it should still be closer to the reality of Twitter than something I entirely generate on my own.

Any tips ?

Sorry to be a bother.

Best regards,

afshinrahimi commented 5 years ago

Hi,

Sorry for the delay.

the pickle file contains the parameters of the classifier (a GraphConv instance) after it is trained, and can be loaded by that instance using GraphConvinstance.load function. So instead of retraining the model, one can load the trained parameters.

GCN is a transductive model, which means that it is difficult to train the model, and then apply it on a new unseen test set.

It is ideal for situations where you have a dataset where some users are labelled and some are not. The labelled data becomes your training, and the unlabelled your test. GCN would work very well in such transductive settings. If you are interested in an inductive model, you can use "pigeo: a simple python geotagging tool."

Afshin