albertocottica / communities-network-design

Online community management as social network design: testing for the signature of management activities in online communities
Other
0 stars 1 forks source link

Improve in-degree distributions visualizations #18

Open albertocottica opened 9 years ago

albertocottica commented 9 years ago

This is what we have: !(https://github.com/albertocottica/communities-network-design/blob/master/Pictures/inDegDIstroCompared.png)

It would be nice to take full control of the drawings, so that that they all have the same scale etc.

Unfortunately, something broke in my configuration; I can still run powerlaw.py from iPython, but I can no longer produce pictures. What it boils down to is that I need a back end for MatPlotLib.

Ben: maybe you can try to do better?

renoust commented 9 years ago

Matplotlib seems a good option! I can automatize the generation of pictures similarly to what you did, and even integrate them in a single shot, a bit like in the file attach (it's just an example of overlays). However what do you mean by same scale?

Anyway, this is a distribution we're showing, so I've played around displaying histograms and so on, but it seems that this kind of drawing shows it as best. I can tune, colors, legends, etc. This baseline, outlines of the PDF with log scales seems the best to represent the distributions.

Benjamin

On 2 July 2015 at 04:23, Alberto Cottica notifications@github.com wrote:

Assigned #18 https://github.com/albertocottica/communities-network-design/issues/18 to @renoust https://github.com/renoust.

— Reply to this email directly or view it on GitHub https://github.com/albertocottica/communities-network-design/issues/18#event-345759090 .

albertocottica commented 9 years ago

That is matplotlib.

Same scale means that the two boxes will have the same size, and that they will comprise the same intervals (on the x axis, from k^0 to k^3).

renoust commented 9 years ago

Of course I know :) Powerlaw actually produces Matplotlib axes. I see better what you mean: it's to emphasize the comparison, right. However, I can't put my hand on the Edgeryders' and InnovatoriPA's data, can you put the degree distributions in the datasets folder? Thanks!

Benjamin

On 9 July 2015 at 22:07, Alberto Cottica notifications@github.com wrote:

That is matplotlib.

Same scale means that the two boxes will have the same size, and that they will comprise the same intervals (on the x axis, from k^0 to k^3).

— Reply to this email directly or view it on GitHub https://github.com/albertocottica/communities-network-design/issues/18#issuecomment-119955488 .

albertocottica commented 9 years ago

Done! https://github.com/albertocottica/communities-network-design/tree/master/Datasets/RealWorldDegrees

renoust commented 9 years ago

Ok, here is a proposal: https://github.com/albertocottica/communities-network-design/blob/master/Pictures/PDF_Edgeryders_figure_1.png https://github.com/albertocottica/communities-network-design/blob/master/Pictures/PDF_InnovatoriPA_figure_1.png

I'm cumulating data from 10 generations over each simulation, but I'm currently running up to 1000 of each for a better accuracy.

I order each curve from (nu_1, nu_2), then plot them with a gradient of green, then plot the real data in red, and the corresponding simulation (no onboarding for InnovatoriPA, (1,1) for Edgeryders) in green.

I'm planning to make the gradient ascending for Edgeryders, and descending

for InnovatoriPA, but here you have an idea.

Benjamin

On 10 July 2015 at 16:42, Alberto Cottica notifications@github.com wrote:

Done!

https://github.com/albertocottica/communities-network-design/tree/master/Datasets/RealWorldDegrees

— Reply to this email directly or view it on GitHub https://github.com/albertocottica/communities-network-design/issues/18#issuecomment-120270232 .

renoust commented 9 years ago

I stopped the generation at 600 for each model because it was time consuming (it took a few days), but if you feel it's statistically worth it to go up to 1000 for each, no problem, I'll generate some more. So I've pushed some more pictures, also a comparison of the degree distribution of all generated models, in hope it helps :)

Benjamin

On 17 July 2015 at 15:50, Benjamin Renoust renoust@gmail.com wrote:

Ok, here is a proposal:

https://github.com/albertocottica/communities-network-design/blob/master/Pictures/PDF_Edgeryders_figure_1.png

https://github.com/albertocottica/communities-network-design/blob/master/Pictures/PDF_InnovatoriPA_figure_1.png

I'm cumulating data from 10 generations over each simulation, but I'm currently running up to 1000 of each for a better accuracy.

I order each curve from (nu_1, nu_2), then plot them with a gradient of green, then plot the real data in red, and the corresponding simulation (no onboarding for InnovatoriPA, (1,1) for Edgeryders) in green.

I'm planning to make the gradient ascending for Edgeryders, and descending

for InnovatoriPA, but here you have an idea.

Benjamin

On 10 July 2015 at 16:42, Alberto Cottica notifications@github.com wrote:

Done!

https://github.com/albertocottica/communities-network-design/tree/master/Datasets/RealWorldDegrees

— Reply to this email directly or view it on GitHub https://github.com/albertocottica/communities-network-design/issues/18#issuecomment-120270232 .

albertocottica commented 9 years ago

No, Ben. I think this does not tell the story we want.

First of all, there is an issue of consistency of the viz with the data:

In the case of Innovatori, having the small curves is just misleading.

In the case of Edgeryders, the small curves make sense, but emphasising one in particular does not.

But the more important problem is this: if you draw a bunch of curves they will look like a thick straight line. We know from the data that, when onboarding is present, this is not the case: the goodness-of-fit test is strongly rejected. If we want to make the point, I think we are down to comparing ONE curve (real-world data) with ONE curve (simulated data). Moreover, I am not convinced they should be in the same diagram: exponents could be different. All we are want to illustrate is that they are straight or not. The way that works best for me is still:

! Innovatori PA is a straight line Edgeryders is arched downwards

We would need to do the same for generated data, with and without onboarding, and then we are done.

renoust commented 9 years ago

After offline progress/discussions with @albertocottica:

Because we are submitting to a journal, potentially with "unlimited" space:

Benjamin

On 20 August 2015 at 01:19, Alberto Cottica notifications@github.com wrote:

No, Ben. I think this does not tell the story we want.

First of all, there is an issue of consistency of the viz with the data:

  • Innovatori => no onboarding. So it should be compared with the control group.
  • Edgeryders => with onboarding (but we do not know how effective the onboarding or how responsive the community.

In the case of Innovatori, having the small curves is just misleading.

In the case of Edgeryders, the small curves make sense, but emphasising one in particular does not.

But the more important problem is this: if you draw a bunch of curves they will look like a thick straight line. We know from the data that, when onboarding is present, this is not the case: the goodness-of-fit test is strongly rejected. If we want to make the point, I think we are down to comparing ONE curve (real-world data) with ONE curve (simulated data). Moreover, I am not convinced they should be in the same diagram: exponents could be different. All we are want to illustrate is that they are straight or not. The way that works best for me is still:

! Innovatori PA is a straight line https://github.com/albertocottica/communities-network-design/blob/fe0bb05d5d069dcd5fd499bb3c485207f5d9c25e/Pictures/InnovatoriPA%20degree%20distribution.png [image: Edgeryders is arched downwards] https://github.com/albertocottica/communities-network-design/blob/fe0bb05d5d069dcd5fd499bb3c485207f5d9c25e/Pictures/Edgeryders%20degree%20distribution.png

We would need to do the same for generated data, with and without onboarding, and then we are done.

— Reply to this email directly or view it on GitHub https://github.com/albertocottica/communities-network-design/issues/18#issuecomment-132673876 .

renoust commented 9 years ago

Following the last point, I've uploaded a series of pictures named "comparison_"... They display pdf of the 600 generations with different parameters, with the following color coding:

most of the curves only compare "no onboarding" with all generated curves (in title is the fixed parameter) even though we've proven nu2 to be ineffective... so the most interesting essentially compare no onboarding with different values of nu1, one by one.

2 other views are available:

Benjamin

On 25 August 2015 at 10:32, Benjamin Renoust renoust@gmail.com wrote:

After offline progress/discussions with @albertocottica:

Because we are submitting to a journal, potentially with "unlimited" space:

Benjamin

On 20 August 2015 at 01:19, Alberto Cottica notifications@github.com wrote:

No, Ben. I think this does not tell the story we want.

First of all, there is an issue of consistency of the viz with the data:

  • Innovatori => no onboarding. So it should be compared with the control group.
  • Edgeryders => with onboarding (but we do not know how effective the onboarding or how responsive the community.

In the case of Innovatori, having the small curves is just misleading.

In the case of Edgeryders, the small curves make sense, but emphasising one in particular does not.

But the more important problem is this: if you draw a bunch of curves they will look like a thick straight line. We know from the data that, when onboarding is present, this is not the case: the goodness-of-fit test is strongly rejected. If we want to make the point, I think we are down to comparing ONE curve (real-world data) with ONE curve (simulated data). Moreover, I am not convinced they should be in the same diagram: exponents could be different. All we are want to illustrate is that they are straight or not. The way that works best for me is still:

! Innovatori PA is a straight line https://github.com/albertocottica/communities-network-design/blob/fe0bb05d5d069dcd5fd499bb3c485207f5d9c25e/Pictures/InnovatoriPA%20degree%20distribution.png [image: Edgeryders is arched downwards] https://github.com/albertocottica/communities-network-design/blob/fe0bb05d5d069dcd5fd499bb3c485207f5d9c25e/Pictures/Edgeryders%20degree%20distribution.png

We would need to do the same for generated data, with and without onboarding, and then we are done.

— Reply to this email directly or view it on GitHub https://github.com/albertocottica/communities-network-design/issues/18#issuecomment-132673876 .