blackducksoftware / ohcount

The Ohloh source code line counter
https://github.com/blackducksoftware/ohcount
GNU General Public License v2.0
261 stars 69 forks source link

Comparing output to wc -l for sanity checking #75

Closed kaihendry closed 3 years ago

kaihendry commented 4 years ago

Why is there such a big difference between find -type f | xargs wc -l's 270997 and ohcount's 4129992?

I'm looking at https://webkitgtk.org/releases/webkitgtk-2.28.2.tar.xz

       5 ./Documentation/jsc-glib-4.0/html/right.png
     117 ./Documentation/jsc-glib-4.0/html/api-index-2-24.html
     355 ./Documentation/jsc-glib-4.0/html/index-all.html
      73 ./Documentation/jsc-glib-4.0/html/annotation-glossary.html
       5 ./Documentation/jsc-glib-4.0/html/left.png
     185 ./Documentation/jsc-glib-4.0/html/jsc-glib-4.0-JSCVersion.html
       3 ./Documentation/jsc-glib-4.0/html/right-insensitive.png
     189 ./Documentation/jsc-glib-4.0/html/jsc-glib-4.0.devhelp2
    2429 ./NEWS
  270997 total
[hendry@t480s webkitgtk-2.28.2]$ ohcount
Examining 19472 file(s)

                          Ohloh Line Count Summary

Language          Files       Code    Comment  Comment %      Blank      Total
----------------  -----  ---------  ---------  ---------  ---------  ---------
cpp               14023    2345093     520098      18.2%     499145    3364336
javascript          843     181268      30452      14.4%      54820     266540
html                472     147073       2728       1.8%        783     150584
c                  1001      76467      47129      38.1%      14921     138517
xml                  29      68888        565       0.8%       1247      70700
python              155      22985       9162      28.5%       6824      38971
css                 266      19816       6578      24.9%       4452      30846
cmake               134      15500       1907      11.0%       2090      19497
perl                 29      14214       1541       9.8%       3354      19109
ruby                 46      13957       1902      12.0%       1849      17708
assembler             5       8167         73       0.9%       1592       9832
shell                25       1409        310      18.0%        256       1975
autoconf              3        394         59      13.0%         52        505
glsl                  7        327         76      18.9%         66        469
bat                   4        272         19       6.5%         45        336
postscript            1         58          0       0.0%          9         67
----------------  -----  ---------  ---------  ---------  ---------  ---------
Total             17043    2915888     622599      17.6%     591505    4129992

Btw in the help http://www.ohloh.net/ is broken.

notalex commented 3 years ago

Hi, apologies for the delay in response. I believe, the right way to find line count recursively is to do the following:

$ wc -l `find -type f`
4710393

If you add up the individual line counts reported for each file, you should see it match this number. The number is higher for wc, because it doesn't skip non program files.

Thanks for reporting the broken link. It is fixed now.