geopanag / IMINFECTOR

MIT License
30 stars 12 forks source link

what is the reason for eq(15) #7

Closed calebauyang closed 4 years ago

calebauyang commented 4 years ago

two questions: 1.the eq (6) was designed for predicting cascade length, why used the eq (15) to compute the number of nodes expected to influence. Moreover the is no explicit evidence to support eq (15). how do you think? 2.an initiator may cause two or more cascades with different information, so the cascades length sourced from an initiator is uncertain, how does the eq (6) work for this case? thank you!

geopanag commented 4 years ago

Thanks for your interest.

  1. The y' (the prediction, not ground truth y) from eq 6 is used indirectly in eq 15. Since parameter C is a constant of ones, prediction for node u is y'_u = S_u*C = |S_u| i.e. the sum of all elements of u's embedding. Now this represents the cascade length, however, in IM we want to estimate the proportion of the network that the node will influence, to do this we normalize by the cascade length of all nodes and multiply by the total number of Nodes. Think of it as y' representing the absolute "influence power" of a node and Lambda is the same measure, but relative to the rest.

  2. That is true, but that is why we input in the neural network lots of different cascades by the same users and repeat the experiments for many epochs, such that the model captures the difference in the hidden weights.

Hope this is clear.

calebauyang commented 4 years ago

Appreciate your answer.

I dont understand what is the evidence to support eq 15. Influence of different node have overlaped scope, so I think it is not applicable to divide in eq 15. For example, node u can influence 8 nodes in total 10 nodes, node v can influence 4 nodes in total 10 nodes,. In this case, according eq 15, it will lead that node u can influence 8/(8+4), not 8/10. Apparently 8/10 is reasonable. How do you think.

Thanks

geopanag commented 4 years ago

That is indeed a more fair method, but when we tried it we encountered a practical problem. In reality, there are influencers that create cascades with tens or even hundreds of thousands. In this case Lambda_u can be huge, so when our algorithm is asked to come up with a seed set of e.g. 100, matrix D will be emptied in the first few iterations, because the top e.g. 10 nodes may cover the whole network. Hence when we look for a bigger seed set, our method would not be effective. This is important because the seed set size is what separates IM with a simple ranking based on e.g. degree, as we mention in the evaluation methodology.

Thanks

calebauyang commented 4 years ago

thanks for your reply