HKUST-KnowComp / MNE

Source Code for IJCAI 2018 paper "Scalable Multiplex Network Embedding"
75 stars 25 forks source link

how to use Twitter data set? #5

Closed Sandy4321 closed 6 years ago

Sandy4321 commented 6 years ago

as stated in paper only Twitter data set may show real code performance which dataset from http://deim.urv.cat/~manlio.dedomenico/data.php should be used and what is python code file to build an interface for this repo thanks

there are many data sets

HIGGS TWITTER | The Higgs dataset has been built after monitoring the spreading processes on Twitter before, during and after the announcement of the discovery of a new particle with the features of the elusive Higgs boson on 4th July 2012. The messages posted in Twitter about this discovery between 1st and 7th July 2012 are considered.Ref: M. De Domenico, A. Lima, P. Mougel and M. Musolesi. The Anatomy of a Scientific Rumor. (Nature Open Access) Scientific Reports 3, 2980 (2013).
-- | --

and
Friends/follower graph

Nodes: 456631
Edges: 14855875
Graph of who retweets whom

Nodes: 425008
Edges: 733647
Graph of who replies to who

Nodes: 37366
Edges: 30836
Graph of who mentions whom

Nodes: 302975
Edges: 449827

HIGGS MULTIPLEX | Multiplex of social interactions in Twitter corresponding to the different actions (friendship, replying, mentioning and retweeting) monitored in the Higgs dataset (see above)There are two multiplex networks: 1) two layers, friendship and aggregated interactions, respectively; 2) four layers, friendship and each type of interaction in each layer separately. See more details in the webpage dedicated to Higgs RumorRef: M. De Domenico, A. Lima, P. Mougel and M. Musolesi The Anatomy of a Scientific Rumor. (Nature Open Access) Scientific Reports 3, 2980 (2013).
-- | --

2-layers Multiplex

Nodes: 456631
4-layers Multiplex

Nodes: 456631
Sandy4321 commented 6 years ago

by the way, these data are unavailable to download 1 2-layers Multiplex

Nodes: 456631 2 4-layers Multiplex

Nodes: 456631

Sandy4321 commented 6 years ago

Vickers data looks like this 1 1 6 1 1 1 8 1 1 1 11 1 1 1 12 1 1 1 14 1 1 1 16 1 but higgs-social_network.edgelist looks like this 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 and higgs-retweet_network.edgelist look like this 390311 14454 1 33642 60810 1 341261 49159 1 232947 59542 1 162399 16344 1 57763 172959 1 37319 91500 1 383282 18206 2 447089 349 1 30775 349 2

Thanks for help

panda0881 commented 6 years ago

I suggest you to use the small datasets to verify our model. I think they should be enough.

If you want to try the Twitter dataset, you will need a very strong machine, even few servers. After all, the Twitter dataset is almost million times larger than the other datasets. We conducted our experiments on the servers. It may consume days.

Of course, if you still want to try it, you can treat the base network as layer one and the retweet network as layer two. And you need to merge these two files into one. Sorry that these datasets are too large, I can't provide the merged data in this repo.

By the way, as shown in the paper, for the twitter data, to simulate the real business, you need to select the edges from the base network as the negative example in the prediction task.

Hongming

Sandy4321 commented 6 years ago

actually, I try to test code for a network with many levels as stated in the paper

If we generalize this idea to all
kinds of network, by “multiplex network,” we mean a group
of networks which contains multiple kinds of relations, and
each kind of the relations can create a layer of the network.
Take the social network as an example. In a social network
such as Facebook, users often have different kinds of interactions
with each other like friendship relation, forwarding
articles to each other, conversation, money transferring, etc.
Each of them will create a layer of the network among all
users. 

may you recommend some dataset with a clear presence of layers, pls