carbon-language / carbon-lang

Carbon Language's main repository: documents, design, implementation, and related tools. (NOTE: Carbon Language is experimental; see README)
https://github.com/carbon-language/carbon-lang/blob/trunk/README.md
Other
32.3k stars 1.48k forks source link

Hack in a unique IDs counter to source stats. #4096

Closed chandlerc closed 3 days ago

chandlerc commented 4 days ago

This is awkward to track... Probably it would be best done by tracking the ratio of unique IDs to lines as a floating point and plot them and see what a best fit distribution curve looks like. But none of the histogram printing or stats tracking stuff already in use here makes it easy to do any of that...

So this does what I hope is a reasonable rough approximation by counting the ceiling of unique identifiers per 10 lines of code, and plotting that discreet histogram. Shape of the histogram is exactly what I would expect: one centered distribution, vaguely normal looking. And the center for a bunch of different codebases, including our toolchain, is exactly at 5, which would mean 0.5 unique IDs per line. And the distribution is pretty reliably bounded above by 10 or 1 unique ID per line. Which almost seems to clean to be true? Slightly worried about confirmation bias making me think this code is working because the results look so pretty.

Here is the output for the toolchain:

  ## Unique IDs per 10 lines ## (median: 6)
  2 ids   [ 2]  █▎
  3 ids   [19]  ████████████▎
  4 ids   [32]  ████████████████████▋
  5 ids   [55]  ███████████████████████████████████▌
  6 ids   [62]  ████████████████████████████████████████
  7 ids   [44]  ████████████████████████████▍
  8 ids   [22]  ██████████████▎
  9 ids   [11]  ███████▏
  10 ids  [ 7]  ████▌
  11 ids  [ 2]  █▎

And here is the output for llvm-project/*/{lib,include} (to avoid tests):

  # Unique IDs per 10 lines ## (median: 5)
  1 ids   [  29]  ▍
  2 ids   [ 282]  ███▊
  3 ids   [1492]  ███████████████████▉
  4 ids   [2674]  ███████████████████████████████████▌
  5 ids   [3011]  ████████████████████████████████████████
  6 ids   [2267]  ██████████████████████████████▏
  7 ids   [1549]  ████████████████████▋
  8 ids   [ 817]  ██████████▉
  9 ids   [ 301]  ████
  10 ids  [  98]  █▎
  11 ids  [  61]  ▊
  12 ids  [  50]  ▋
  13 ids  [  25]  ▍
  14 ids  [  33]  ▌
  15 ids  [  14]  ▏
  16 ids  [  15]  ▎
  17 ids  [   9]  ▏
  18 ids  [   8]  ▏
  19 ids  [  12]  ▏
  20 ids  [  15]  ▎
  21 ids  [   3]
  22 ids  [   8]  ▏
  23 ids  [   3]
  24 ids  [   3]
  25 ids  [   6]  ▏
  26 ids  [   0]
  27 ids  [   2]
  28 ids  [   0]
  29 ids  [   0]
  30 ids  [   3]
  31 ids  [   1]
  32 ids  [   1]