RobRomijnders / bigclam

Implements the bigCLAM algorithm
MIT License
50 stars 16 forks source link

Definition of "production code" #2

Open skilfullycurled opened 6 years ago

skilfullycurled commented 6 years ago

Hi,

Thanks so much for this implementation. Per your promise, it is quite readable and therefore very educational. You write in the readme that:

The code aims at people who want to learn about algorithms for social graphs. By far, this won't do for production code. We aim at readable code for educational purposes. main.py implements the algorithm, util/generate_data.py generates data and ui/index.html helps us with plotting our social graph.

Can you be a bit more specific on what you mean by "production code"? Are you saying you don't think it will work speed-wise...? accuracy..? on graphs larger than n size...?

Thanks for your help!

RobRomijnders commented 6 years ago

Hi,

I remember that I used some for-loops at some places where you could also do some broadcasting. However, it has been some time since I implemented this.

Anyway, most operations on social network graphs are done on faster languages, so use this python example to learn the method. Then you can write scalable code for yourself.

If you don't mind me asking, maybe you can tell a bit more on your data and problem .. ?

Rob

On 7 March 2018 at 17:20, skilfullycurled notifications@github.com wrote:

Hi,

Thanks so much for this implementation. Per your promise, it is quite readable and therefore very educational. You write in the readme that:

The code aims at people who want to learn about algorithms for social graphs. By far, this won't do for production code. We aim at readable code for educational purposes. main.py implements the algorithm, util/generate_data.py generates data and ui/index.html helps us with plotting our social graph.

Can you be a bit more specific on what you mean by "production code"? Are you saying you don't think it will work speed-wise...? accuracy..? on graphs larger than n size...?

Thanks for your help!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/RobRomijnders/bigclam/issues/2, or mute the thread https://github.com/notifications/unsubscribe-auth/APbLxaAW4JvdIEXCC8k7XxvADosXKLSDks5tcAjBgaJpZM4SgvPr .

skilfullycurled commented 6 years ago

I don't mind at all!

I'm going to be analyzing two twitter networks the first about...~2 million friends/followers and the second an interaction graph (e.g. mentions, retweets, replies) of...I'm not sure but it'll be in the millions. I'm not sure if BigClam will yield better results than other community detection methods but I am very attracted to the possibility of finding overlapping communities since I think this more closely mirrors realty.

As far as the meaning of "production code" is concerned, I'll have access to some pretty hefty computing so if it's a matter of inefficient code then that might not be too much of a problem.

Follow up question for you (if I may, happy to open a new issue to keep things organized): regarding data generation vs. data input:

p2c = datagen.person2comm
adj = datagen.adj

I trust I can replace datagen.adj with an actual adjacency matrix, but what is the replacement for the person2comm?

RobRomijnders commented 6 years ago

Ok, sounds good.

the person2comm relates the persons to the community in the ground truth. i use it for plotting, nothing more. I also used this project to play with javascript and D3. You can find that code in the ui folder.

Let me know how your results turn out

On 8 March 2018 at 21:46, skilfullycurled notifications@github.com wrote:

I don't mind at all!

I'm going to be analyzing two twitter networks the first about...~2 million friends/followers and the second an interaction graph (e.g. mentions, retweets, replies) of...I'm not sure but it'll be in the millions. I'm not sure if BigClam will yield better results than other community detection methods but I am very attracted to the possibility of finding overlapping communities since I think this more closely mirrors realty.

As far as the meaning of "production code" is concerned, I'll have access to some pretty hefty computing so if it's a matter of inefficient code then that might not be too much of a problem.

Follow up question for you (if I may, happy to open a new issue to keep things organized): regarding data generation vs. data input:

p2c = datagen.person2comm adj = datagen.adj

I trust I can replace datagen.adj with an actual adjacency matrix, but what is the replacement for the person2comm?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/RobRomijnders/bigclam/issues/2#issuecomment-371618650, or mute the thread https://github.com/notifications/unsubscribe-auth/APbLxSg2s64MddUpv6kux3tlkETDeNDjks5tcZiMgaJpZM4SgvPr .