Closed JJ closed 5 years ago
There are two possible ways to proceed at this point:
Can we do both?
of course :)
looking at the results I would say that tensorflow, tpot and django are quite strange. They have changed the xmin value a lot.
I have not included them in the corr matrix plot, even using some scalers form sklearn, they are so big that blur the rest.
Maybe they have big squash merges, which count as a single commit.
El lun., 25 feb. 2019 a las 17:43, Bartolomé Ortiz Viso (< notifications@github.com>) escribió:
looking at the results I would say that tensorflow, tpot and django are quite strange. They have change the xmin value a lot.
I have not include them in the corr matrix plot, even using some scalers form sklearn, they are so big that blur the rest.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/geneura-papers/2019-SASO-Repos-Powerlaws/issues/9#issuecomment-467084194, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAB9MjZ3WhUb-rqc6SSzbAGsMs7_jeIks5vRBK3gaJpZM4bPZbv .
-- JJ
Looking at the charts and your description above, I feel I'm missing the big picture here. Give me, in three or four sentences, what they mean, what are the implications, why we want to do this and what can we conclude from the two summary charts.
For instance, what does this mean? Why do we want to know?
What about this one? What is ND? Is is comparing how many follow one or the other, and ND simply means that the rest of the repos do not follow either?
In the case above, it would be better if the columns for 2017 and 2019 were side by side. Off the top of my head, I would say that the probability of these distributions following a truncated power law increases (if we compare it with LogNormal), while the probability of following a pure power law decreases (if we compare it with an exponential). Can't we do a Friedman or something like that to say which distribution wins overall?
Looking at the charts and your description above, I feel I'm missing the big picture here. Give me, in three or four sentences, what they mean, what are the implications, why we want to do this and what can we conclude from the two summary charts.
There are 3 types of plots:
So the paper boils down to:
Does this make sense? Can we tell this in the intro?
Off the top of my head, I would say that the probability of these distributions following a truncated power law increases (if we compare it with LogNormal), while the probability of following a pure power law decreases (if we compare it with an exponential). Can't we do a Friedman or something like that to say which distribution wins overall?
I am now working on that type of comparison. More than to say which win (since this can be hard to define) I am thinking on something like the measure used in Scale-free networks are rare by Clauset et al. There they rank like: strong evidence of following a powerlaw, medium evidence of following a PL, etc. I think it can be easily used in our case and summarize the bar-plot in more understable way.
In the case above, it would be better if the columns for 2017 and 2019 were side by side.
Agreed, good point. btw, As all of the plots have their dataset associated stored in \data, when we decide which are we going to include, they can be changed or generated the way we want with R.
So the paper boils down to:
* Do code repositories actually follow a power law? * If they do, at what x_min does it start? * Table for x_min in 2017, 2019. * What alpha would they have? What is the range? What is it related to? * Chart plotting PL fit, 2017, 2019. * Correlation matrix between xmin and alfa? ( ← does this make sense? )
HUmmmm. I believe that a extreme change in xmin could mean a change in the underlying distribution. Therefore, I will no use it here.
On the other hand, I think that it would be nice to measure correlation between alpha and number of commits. This, I think, proves that our system is in some kind of equilibrium, so even when it evolves in time (two years) all of the modifications are regulated and the system does not (dramatically) changes. If there was some kind of correlation I would argue that the system is still changing and no conclusion should be done, right? Let me know what you think on this.
* If they don't, do they phase-change and start following it all of a sudden? * Table with most likely adjustment 2017, 2019. * Chart with comparison (chart above) 2017, 2019.
Does this make sense? Can we tell this in the intro?
I think so, yes!
El lun., 25 feb. 2019 a las 20:16, Bartolomé Ortiz Viso (< notifications@github.com>) escribió:
So the paper boils down to:
Do code repositories actually follow a power law?
If they do, at what x_min does it start?
Table for x_min in 2017, 2019.
What alpha would they have? What is the range? What is it related to?
Chart plotting PL fit, 2017, 2019.
Correlation matrix between xmin and alfa? ( ← does this make sense? )
HUmmmm. I believe that a extreme change in xmin could mean a change in the underlying distribution. Therefore, I will no use it here.
OK. Commenting why it does not make sense might help too.
On the other hand, I think that it would be nice to measure correlation between alpha and number of commits. This, I think, proves that our system is in some kind of equilibrium, so even when it evolves in
We need to work with more than 16 repos, then. I don't know if that fits within this paper. We have 32 samples here, if we find that correlation, we could try and put that in the future.
time (two years) all of the modifications are regulated and the system does not (dramatically) changes. If there was some kind of correlation I would argue that the system is still changing and no conclusion should be done, right?
I don't think there will be any kind of equilibrium other than punctuated equilibrium. If they are effectively in a critical state, they are going to evolve all the time. So I would say that's the case.
Let me know what you think on this.
If they don't, do they phase-change and start following it all of a sudden?
Table with most likely adjustment 2017, 2019.
Chart with comparison (chart above) 2017, 2019.
Does this make sense? Can we tell this in the intro?
I think so, yes!
So please write it down (or some equivalent) in the introduction
JJ
Do code repositories actually follow a power law? Extract p-value from Kolmogorov-Smirnov test to see if there is some evidence of powerlaw
If they do, at what x_min does it start? xmin extracted by default in the previous test
Table for x_min in 2017, 2019.
What alpha would they have? What is the range? What is it related to? Calculated by Newman formula when we know xmin
Chart plotting PL fit, 2017, 2019.
Correlation matrix between xmin and alfa? ( ← does this make sense? )
If they don't, do they phase-change and start following it all of a sudden? loglikelihood ratio tests for alternative models. Here we can compare all models in front of Powerlaws and do an extra truncated-PL vs lognormal. From that extract a score
Table with most likely adjustment 2017, 2019.
Chart with comparison (chart above) 2017, 2019.
I'm now working on the first KS test, the rest are mostly finished. ( just to keep record of what we are doing.)
Please remember and write stuff in the intro and abstract. We can always modify that if the hypotheses do not hold, but the stuff needs to be written, and rewritten, and reviewed, and so on.
From the point of view of whether they show, or not, self-organization... Highlight the repos with the most remarkable changes.