Open albertocottica opened 9 years ago
Matplotlib seems a good option! I can automatize the generation of pictures similarly to what you did, and even integrate them in a single shot, a bit like in the file attach (it's just an example of overlays). However what do you mean by same scale?
Anyway, this is a distribution we're showing, so I've played around displaying histograms and so on, but it seems that this kind of drawing shows it as best. I can tune, colors, legends, etc. This baseline, outlines of the PDF with log scales seems the best to represent the distributions.
Benjamin
On 2 July 2015 at 04:23, Alberto Cottica notifications@github.com wrote:
Assigned #18 https://github.com/albertocottica/communities-network-design/issues/18 to @renoust https://github.com/renoust.
— Reply to this email directly or view it on GitHub https://github.com/albertocottica/communities-network-design/issues/18#event-345759090 .
That is matplotlib.
Same scale means that the two boxes will have the same size, and that they will comprise the same intervals (on the x axis, from k^0 to k^3).
Of course I know :) Powerlaw actually produces Matplotlib axes. I see better what you mean: it's to emphasize the comparison, right. However, I can't put my hand on the Edgeryders' and InnovatoriPA's data, can you put the degree distributions in the datasets folder? Thanks!
Benjamin
On 9 July 2015 at 22:07, Alberto Cottica notifications@github.com wrote:
That is matplotlib.
Same scale means that the two boxes will have the same size, and that they will comprise the same intervals (on the x axis, from k^0 to k^3).
— Reply to this email directly or view it on GitHub https://github.com/albertocottica/communities-network-design/issues/18#issuecomment-119955488 .
Ok, here is a proposal: https://github.com/albertocottica/communities-network-design/blob/master/Pictures/PDF_Edgeryders_figure_1.png https://github.com/albertocottica/communities-network-design/blob/master/Pictures/PDF_InnovatoriPA_figure_1.png
I'm cumulating data from 10 generations over each simulation, but I'm currently running up to 1000 of each for a better accuracy.
I order each curve from (nu_1, nu_2), then plot them with a gradient of green, then plot the real data in red, and the corresponding simulation (no onboarding for InnovatoriPA, (1,1) for Edgeryders) in green.
I'm planning to make the gradient ascending for Edgeryders, and descending
Benjamin
On 10 July 2015 at 16:42, Alberto Cottica notifications@github.com wrote:
Done!
https://github.com/albertocottica/communities-network-design/tree/master/Datasets/RealWorldDegrees
— Reply to this email directly or view it on GitHub https://github.com/albertocottica/communities-network-design/issues/18#issuecomment-120270232 .
I stopped the generation at 600 for each model because it was time consuming (it took a few days), but if you feel it's statistically worth it to go up to 1000 for each, no problem, I'll generate some more. So I've pushed some more pictures, also a comparison of the degree distribution of all generated models, in hope it helps :)
Benjamin
On 17 July 2015 at 15:50, Benjamin Renoust renoust@gmail.com wrote:
Ok, here is a proposal:
I'm cumulating data from 10 generations over each simulation, but I'm currently running up to 1000 of each for a better accuracy.
I order each curve from (nu_1, nu_2), then plot them with a gradient of green, then plot the real data in red, and the corresponding simulation (no onboarding for InnovatoriPA, (1,1) for Edgeryders) in green.
I'm planning to make the gradient ascending for Edgeryders, and descending
for InnovatoriPA, but here you have an idea.
Benjamin
On 10 July 2015 at 16:42, Alberto Cottica notifications@github.com wrote:
Done!
https://github.com/albertocottica/communities-network-design/tree/master/Datasets/RealWorldDegrees
— Reply to this email directly or view it on GitHub https://github.com/albertocottica/communities-network-design/issues/18#issuecomment-120270232 .
No, Ben. I think this does not tell the story we want.
First of all, there is an issue of consistency of the viz with the data:
In the case of Innovatori, having the small curves is just misleading.
In the case of Edgeryders, the small curves make sense, but emphasising one in particular does not.
But the more important problem is this: if you draw a bunch of curves they will look like a thick straight line. We know from the data that, when onboarding is present, this is not the case: the goodness-of-fit test is strongly rejected. If we want to make the point, I think we are down to comparing ONE curve (real-world data) with ONE curve (simulated data). Moreover, I am not convinced they should be in the same diagram: exponents could be different. All we are want to illustrate is that they are straight or not. The way that works best for me is still:
! Innovatori PA is a straight line
We would need to do the same for generated data, with and without onboarding, and then we are done.
After offline progress/discussions with @albertocottica:
Because we are submitting to a journal, potentially with "unlimited" space:
Benjamin
On 20 August 2015 at 01:19, Alberto Cottica notifications@github.com wrote:
No, Ben. I think this does not tell the story we want.
First of all, there is an issue of consistency of the viz with the data:
- Innovatori => no onboarding. So it should be compared with the control group.
- Edgeryders => with onboarding (but we do not know how effective the onboarding or how responsive the community.
In the case of Innovatori, having the small curves is just misleading.
In the case of Edgeryders, the small curves make sense, but emphasising one in particular does not.
But the more important problem is this: if you draw a bunch of curves they will look like a thick straight line. We know from the data that, when onboarding is present, this is not the case: the goodness-of-fit test is strongly rejected. If we want to make the point, I think we are down to comparing ONE curve (real-world data) with ONE curve (simulated data). Moreover, I am not convinced they should be in the same diagram: exponents could be different. All we are want to illustrate is that they are straight or not. The way that works best for me is still:
! Innovatori PA is a straight line https://github.com/albertocottica/communities-network-design/blob/fe0bb05d5d069dcd5fd499bb3c485207f5d9c25e/Pictures/InnovatoriPA%20degree%20distribution.png [image: Edgeryders is arched downwards] https://github.com/albertocottica/communities-network-design/blob/fe0bb05d5d069dcd5fd499bb3c485207f5d9c25e/Pictures/Edgeryders%20degree%20distribution.png
We would need to do the same for generated data, with and without onboarding, and then we are done.
— Reply to this email directly or view it on GitHub https://github.com/albertocottica/communities-network-design/issues/18#issuecomment-132673876 .
Following the last point, I've uploaded a series of pictures named "comparison_"... They display pdf of the 600 generations with different parameters, with the following color coding:
most of the curves only compare "no onboarding" with all generated curves (in title is the fixed parameter) even though we've proven nu2 to be ineffective... so the most interesting essentially compare no onboarding with different values of nu1, one by one.
2 other views are available:
Benjamin
On 25 August 2015 at 10:32, Benjamin Renoust renoust@gmail.com wrote:
After offline progress/discussions with @albertocottica:
- everyone can find here https://github.com/albertocottica/communities-network-design/blob/master/Pictures/generated_subset.zip the subsetted list of best candidates for our generated data illustration (we had 600 no onboarding, 600 with nu1 = 1 nu2 = 1, + the edgeryders and innovatoriPA, now we have 122, and 126 "best candidates"). Please take some time to reduce the list, or choose a candidate for each.
Because we are submitting to a journal, potentially with "unlimited" space:
- we have opened another discussion concerning the actual (tulip) drawings of the networks https://github.com/albertocottica/communities-network-design/tree/master/Pictures/tulip-images, the potential story we can tell with them can be 3 drawings side by side: "here is a network with no onboarding https://github.com/albertocottica/communities-network-design/blob/master/Pictures/tulip-images/no_onboarding.png, here is the same with onboarding but no preferential attachment https://github.com/albertocottica/communities-network-design/blob/master/Pictures/tulip-images/onboarding_2000_nu_0.png, and finally here is the effect of preferential attachment with onboarding https://github.com/albertocottica/communities-network-design/blob/master/Pictures/tulip-images/onboarding_2000_nu_1.png ".
- this discussion can also be told differently with the data issued from the 600 generations of each network https://github.com/albertocottica/communities-network-design/tree/master/Pictures/600%20generations (I will redraw new PDF if with the right labels colors etc. if we choose this solution). The effect of no onboarding can be illustrated by these two curves (with fixed nu1 https://github.com/albertocottica/communities-network-design/blob/master/Pictures/600%20generations/nu2_0.0_plus_no_onboarding.png and fixed nu2 https://github.com/albertocottica/communities-network-design/blob/master/Pictures/600%20generations/nu1_0_plus_no_onboarding.png ) So I can remake a picture similar to these ones (here https://github.com/albertocottica/communities-network-design/blob/master/Pictures/600%20generations/innovatoriPA_600.png or there https://github.com/albertocottica/communities-network-design/blob/master/Pictures/600%20generations/edgeryders_600.png) but removing the real data (innovatoriPA or edgeryders) and putting the "no onboarding" in red and the pref attachment in bold green, and do the same for the nu1=0 values. In this case we will highlight the limited variation brought by nu2 and strong variation of nu1, and oppose it to the no-onboarding case. I'll try to prototype this quickly.
Benjamin
On 20 August 2015 at 01:19, Alberto Cottica notifications@github.com wrote:
No, Ben. I think this does not tell the story we want.
First of all, there is an issue of consistency of the viz with the data:
- Innovatori => no onboarding. So it should be compared with the control group.
- Edgeryders => with onboarding (but we do not know how effective the onboarding or how responsive the community.
In the case of Innovatori, having the small curves is just misleading.
In the case of Edgeryders, the small curves make sense, but emphasising one in particular does not.
But the more important problem is this: if you draw a bunch of curves they will look like a thick straight line. We know from the data that, when onboarding is present, this is not the case: the goodness-of-fit test is strongly rejected. If we want to make the point, I think we are down to comparing ONE curve (real-world data) with ONE curve (simulated data). Moreover, I am not convinced they should be in the same diagram: exponents could be different. All we are want to illustrate is that they are straight or not. The way that works best for me is still:
! Innovatori PA is a straight line https://github.com/albertocottica/communities-network-design/blob/fe0bb05d5d069dcd5fd499bb3c485207f5d9c25e/Pictures/InnovatoriPA%20degree%20distribution.png [image: Edgeryders is arched downwards] https://github.com/albertocottica/communities-network-design/blob/fe0bb05d5d069dcd5fd499bb3c485207f5d9c25e/Pictures/Edgeryders%20degree%20distribution.png
We would need to do the same for generated data, with and without onboarding, and then we are done.
— Reply to this email directly or view it on GitHub https://github.com/albertocottica/communities-network-design/issues/18#issuecomment-132673876 .
This is what we have: !(https://github.com/albertocottica/communities-network-design/blob/master/Pictures/inDegDIstroCompared.png)
It would be nice to take full control of the drawings, so that that they all have the same scale etc.
Unfortunately, something broke in my configuration; I can still run powerlaw.py from iPython, but I can no longer produce pictures. What it boils down to is that I need a back end for MatPlotLib.
Ben: maybe you can try to do better?