XAMPPRocky / tokei

Count your code, quickly.
Other
11.02k stars 533 forks source link

Update Comparison Page #324

Open boyter opened 5 years ago

boyter commented 5 years ago

The current comparison page https://github.com/XAMPPRocky/tokei/blob/master/COMPARISON.md is a little out of date. Would be good to update it.

Loc is now at version 0.5.0 Scc is now at version 2.3.0 Cloc is now at version 1.8.0

Perhaps polyglot can be added by downloading a binary rather then running from source?

XAMPPRocky commented 5 years ago

Thank you for this issue! I do want to update the comparison document with the latest version of all the tools. However I do also want to change how they are compared, as a simple "How long did it take for this program take to finish?" is not an accurate measurement of these tools. As most of the time difference between the different programs is down to one counting more files than the other.

Rightnow I'm thinking of replacing that metric with two different measurements.

  1. What is the average time it takes to process a file scaled and averaged to tens of thousands?
    • This should test how well the program can do large scale I/O for big projects with large number of files.
  2. How long is the time to process a single large file?
    • This should test how long the program takes to actually process a file once I/O is complete.
boyter commented 5 years ago

I wrote an artificial test a while ago for scc that was designed to test how each tool works over different directory types with the same file in each one which each tool counted the exact same way. You can find details here https://boyter.org/posts/sloc-cloc-code-performance/ Under the heading “A Fair Benchmark”

I’d love to work with you to get something that seems fair across all tools so the results are read the same way by everyone and not as open to interpretation. If both tokei and scc did this I believe all over tools could follow and we would have a real baseline to work on. Just based on the number of tests I have done I can craft benchmarks that show any of the tools to be fastest for example with little effort.

The 10’s of thousands indeed is what I went with. I never considered a single large file because the average file length across a large project like the linux kernel is ~16000 bytes anyway so it seemed redundant to me, although I could see it being useful for large JSON and XML files perhaps.