afshinrahimi / geographconv

Semi-supervised User Geolocation via Graph Convolutional Networks
67 stars 18 forks source link

Question regarding Twitter-World dataset #6

Closed JunyiChE closed 11 months ago

JunyiChE commented 11 months ago

Hi, First of all, very impressive work that set the foundation of user geolocation task using graph-based method.

However, there is one question regarding twitter-world dataset. As the mention graph is critical in the methodology, current twitter-world dataset provided by your link seems does not contain the mention@ or reply information towards your encoded id.

For example, the user id in the dataset seems to be encoded as '421527640' , '86941247' but the mentioned user in the stacked tweet seems to be '171966527 36.09986 -80.24422 ||| @wakefan2321 what's your PSN name?', which is the plaintext. Therefore, this makes it unavailable to construct the mention graph based on the encoded user id.

Can you kindly clarify this, maybe I miss some details in the dataset.

Best,

afshinrahimi commented 11 months ago

Hi Junyi. The details of graph construction is unfortunately left out of this paper, but available in Twitter User Geolocation Using a Unified Text and Network Prediction Model.

If id 5676544 and 5434677 both mention @auser then they're connected. We exclude celebrities in @mentions so they can't connect nodes. Please read the other paper and if anything is not clear ping me.

Also have a look at https://github.com/afshinrahimi/geographconv/blob/master/data.py which contains the code to build the 'collapsed' graph from tweets.

JunyiChE commented 11 months ago

Hi, Thanks for your time and clarification, now I understand the constructed graph for twitter-world is actually co-mentioned relationship between users.

Once again, very impressive work!

Best,