geneura-papers / 2019-SASO-Repos-Powerlaws

2019-SASO-Repos-PowerLaw
GNU General Public License v3.0
0 stars 0 forks source link

Comment and discuss results. #9

Closed JJ closed 5 years ago

JJ commented 5 years ago

From the point of view of whether they show, or not, self-organization... Highlight the repos with the most remarkable changes.

thebooort commented 5 years ago

There are two possible ways to proceed at this point:

JJ commented 5 years ago

Can we do both?

thebooort commented 5 years ago

of course :)

thebooort commented 5 years ago

looking at the results I would say that tensorflow, tpot and django are quite strange. They have changed the xmin value a lot.

I have not included them in the corr matrix plot, even using some scalers form sklearn, they are so big that blur the rest.

JJ commented 5 years ago

Maybe they have big squash merges, which count as a single commit.

El lun., 25 feb. 2019 a las 17:43, Bartolomé Ortiz Viso (< notifications@github.com>) escribió:

looking at the results I would say that tensorflow, tpot and django are quite strange. They have change the xmin value a lot.

I have not include them in the corr matrix plot, even using some scalers form sklearn, they are so big that blur the rest.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/geneura-papers/2019-SASO-Repos-Powerlaws/issues/9#issuecomment-467084194, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAB9MjZ3WhUb-rqc6SSzbAGsMs7_jeIks5vRBK3gaJpZM4bPZbv .

-- JJ

JJ commented 5 years ago

Looking at the charts and your description above, I feel I'm missing the big picture here. Give me, in three or four sentences, what they mean, what are the implications, why we want to do this and what can we conclude from the two summary charts.

JJ commented 5 years ago

For instance, what does this mean? Why do we want to know?

corr_matrix

JJ commented 5 years ago

What about this one? What is ND? Is is comparing how many follow one or the other, and ND simply means that the rest of the repos do not follow either? summary

JJ commented 5 years ago

In the case above, it would be better if the columns for 2017 and 2019 were side by side. Off the top of my head, I would say that the probability of these distributions following a truncated power law increases (if we compare it with LogNormal), while the probability of following a pure power law decreases (if we compare it with an exponential). Can't we do a Friedman or something like that to say which distribution wins overall?

thebooort commented 5 years ago

Looking at the charts and your description above, I feel I'm missing the big picture here. Give me, in three or four sentences, what they mean, what are the implications, why we want to do this and what can we conclude from the two summary charts.

There are 3 types of plots:

JJ commented 5 years ago

So the paper boils down to:

Does this make sense? Can we tell this in the intro?

thebooort commented 5 years ago

Off the top of my head, I would say that the probability of these distributions following a truncated power law increases (if we compare it with LogNormal), while the probability of following a pure power law decreases (if we compare it with an exponential). Can't we do a Friedman or something like that to say which distribution wins overall?

I am now working on that type of comparison. More than to say which win (since this can be hard to define) I am thinking on something like the measure used in Scale-free networks are rare by Clauset et al. There they rank like: strong evidence of following a powerlaw, medium evidence of following a PL, etc. I think it can be easily used in our case and summarize the bar-plot in more understable way.

In the case above, it would be better if the columns for 2017 and 2019 were side by side.

Agreed, good point. btw, As all of the plots have their dataset associated stored in \data, when we decide which are we going to include, they can be changed or generated the way we want with R.

thebooort commented 5 years ago

So the paper boils down to:

* Do code repositories actually follow a power law?

  * If they do, at what x_min does it start?

    * Table for x_min in 2017, 2019.
  * What alpha would they have? What is the range? What is it related to?

    * Chart plotting PL fit, 2017, 2019.
    * Correlation matrix between xmin and alfa? ( ← does this make sense? )

HUmmmm. I believe that a extreme change in xmin could mean a change in the underlying distribution. Therefore, I will no use it here.

On the other hand, I think that it would be nice to measure correlation between alpha and number of commits. This, I think, proves that our system is in some kind of equilibrium, so even when it evolves in time (two years) all of the modifications are regulated and the system does not (dramatically) changes. If there was some kind of correlation I would argue that the system is still changing and no conclusion should be done, right? Let me know what you think on this.

  * If they don't, do they phase-change and start following it all of a sudden?

    * Table with most likely adjustment 2017, 2019.
    * Chart with comparison (chart above) 2017, 2019.

Does this make sense? Can we tell this in the intro?

I think so, yes!

JJ commented 5 years ago

El lun., 25 feb. 2019 a las 20:16, Bartolomé Ortiz Viso (< notifications@github.com>) escribió:

So the paper boils down to:

  • Do code repositories actually follow a power law?

    • If they do, at what x_min does it start?

    • Table for x_min in 2017, 2019.

    • What alpha would they have? What is the range? What is it related to?

    • Chart plotting PL fit, 2017, 2019.

    • Correlation matrix between xmin and alfa? ( ← does this make sense? )

HUmmmm. I believe that a extreme change in xmin could mean a change in the underlying distribution. Therefore, I will no use it here.

OK. Commenting why it does not make sense might help too.

On the other hand, I think that it would be nice to measure correlation between alpha and number of commits. This, I think, proves that our system is in some kind of equilibrium, so even when it evolves in

We need to work with more than 16 repos, then. I don't know if that fits within this paper. We have 32 samples here, if we find that correlation, we could try and put that in the future.

time (two years) all of the modifications are regulated and the system does not (dramatically) changes. If there was some kind of correlation I would argue that the system is still changing and no conclusion should be done, right?

I don't think there will be any kind of equilibrium other than punctuated equilibrium. If they are effectively in a critical state, they are going to evolve all the time. So I would say that's the case.

Let me know what you think on this.

  • If they don't, do they phase-change and start following it all of a sudden?

    • Table with most likely adjustment 2017, 2019.

    • Chart with comparison (chart above) 2017, 2019.

Does this make sense? Can we tell this in the intro?

I think so, yes!

So please write it down (or some equivalent) in the introduction

JJ

thebooort commented 5 years ago

I'm now working on the first KS test, the rest are mostly finished. ( just to keep record of what we are doing.)

JJ commented 5 years ago

Please remember and write stuff in the intro and abstract. We can always modify that if the hypotheses do not hold, but the stuff needs to be written, and rewritten, and reviewed, and so on.