boyter / scc

Sloc, Cloc and Code: scc is a very fast accurate code counter with complexity calculations and COCOMO estimates written in pure Go
MIT License
6.38k stars 250 forks source link

feature: org-mode and literate programming (src-snippets in org-files) #264

Open jhelberg opened 3 years ago

jhelberg commented 3 years ago

Currently scc counts projects where all source-files come from one or more tangled org-mode files double: once as org-source and once as go/sh/whatever files.

Of course it is possible to run scc on the .org files only, but then all language-specific parsing and qualification is dropped.

Would it be possible to implement parsing org-files where #+begin_src go and #+begin_src c are recognized as the start of a go, resp. c code-block, or any other language? Scc may even recognize the :tangle directive and remove the mentioned generated source files from it's inspection targets.

I'm happy to help of course.

boyter commented 3 years ago

So this is possible now if I am reading this right.

scc --remap-all
scc --remap-unknown

Which should inspect the start of the file and do this for you. I don't put this on by default generally because of the performance implementation.

However I suspect in this case they have the same extension?

If you can provide a sample of where this happens, and what you are trying to achieve that would really help.

jhelberg commented 3 years ago

I'll file an example org-file below, explaining the issue in depth. --remap-all works for one language per file only. Literate programming in org-mode uses a few (less than 5) org-files for generating tons of source code in different languages. In org-mode code snippets start with #+begin_src and end with #+end_src.

jhelberg commented 3 years ago
#+LATEX_HEADER: \DeclareUnicodeCharacter{2500}{–}
* =scc= counting lines in org-files with org-babel
  :PROPERTIES:
  :ID:       0465d03e-3ab0-4734-afff-257b583274d6
  :END:
  Maybe I'm not representing the mainstream developer, but I stick to
  just a few org-files per project which generate (a process called
  tangle, see Knuth: Literate Programming, org-babel is an
  implementation of this for org-files) a few to hundres of source
  files. The org-file is documentation and code in one (creating
  documentation is called weave). As most projects use various
  languages, the org-file contains shell-scripts, go-code, dot-graphs,
  sql and more, much more. Generated code is usually comment-free,
  those words are in the documentation surrounding the code.

  As the org-file contains code and documentation, it's good enough to
  qualify every part of the file and to count using the parts-contents
  and it's language qualification. As there is no documentation-column
  in scc, these lines (which is everything not qualified as a
  language) may end up in the comment-column or code, whichever is
  better.

  Some feature where scc processes the org-file while understanding
  the =#+= =begin_src= directive for setting the language for this
  block would be a great improvement.

  A few steps in scc-ing this org-file are explained below.

* the default case
  :PROPERTIES:
  :ID:       3701d407-7076-4597-9cf7-1084a1ead184
  :END:
  To process this file, one uses:
  #+begin_src sh :results org :tangle runscc.sh :shebang "#!/bin/bash" :noweb yes :exports both
  scc -i org | \
    <<organise-output>>
  #+end_src

  #+RESULTS:
  #+begin_src org
  |---------------------------------------------------------------------------------|
  | Language                 Files     Lines   Blanks  Comments     Code Complexity |
  |---------------------------------------------------------------------------------|
  | Org                          1       187       20         0      167          8 |
  |---------------------------------------------------------------------------------|
  | Total                        1       187       20         0      167          8 |
  |---------------------------------------------------------------------------------|
  | Estimated Cost to Develop (organic) $4,125                                      |
  | Estimated Schedule Effort (organic) 1.707199 months                             |
  | Estimated People Required (organic) 0.214674                                    |
  |---------------------------------------------------------------------------------|
  | Processed 8848 bytes, 0.009 megabytes (SI)                                      |
  |---------------------------------------------------------------------------------|
  #+end_src

  The above is OK, but doesn't qualify the code-snippets in this
  document as belonging to a particular language and misses out on
  reporting, accuracy and completeness.

  Note that the org-entry in =languages.json= uses sections-markers
  (=*=, ...) up to depth 4 for complexity.

** count in .
   :PROPERTIES:
   :ID:       012bebe4-7e4b-4799-a185-4f82cbd56a92
   :END:
  One can also pick up generated source as well:
  #+begin_src sh :results org :tangle runscc.sh :shebang "#!/bin/bash" :noweb yes :exports both
  scc | \
    <<organise-output>>
  #+end_src

  #+RESULTS:
  #+begin_src org
  |---------------------------------------------------------------------------------|
  | Language                 Files     Lines   Blanks  Comments     Code Complexity |
  |---------------------------------------------------------------------------------|
  | Go                           1         6        2         0        4          0 |
  | LaTeX                        1       105       11         2       92          0 |
  | Org                          1       190       21         0      169          8 |
  | SQL                          1         1        0         0        1          0 |
  | Shell                        1        15        2         1       12          0 |
  |---------------------------------------------------------------------------------|
  | Total                        5       317       36         3      278          8 |
  |---------------------------------------------------------------------------------|
  | Estimated Cost to Develop (organic) $7,044                                      |
  | Estimated Schedule Effort (organic) 2.092156 months                             |
  | Estimated People Required (organic) 0.299133                                    |
  |---------------------------------------------------------------------------------|
  | Processed 11561 bytes, 0.012 megabytes (SI)                                     |
  |---------------------------------------------------------------------------------|
  #+end_src

  But that is wrong, as it counts both text and code lines in the
  org-file as code in addition to the code in the generated
  code-files. All code is counted twice, documentation once. Also,
  using no-web is close to calling functions; the shell-script
  contains 15 lines, but that's actually just 6 lines which expand to
  15 using the organise-output macro.

  Also, but this is a minor issue, the temporary latex file is counted
  in.

** count using =--remap-all=
   :PROPERTIES:
   :ID:       72f45efa-2297-4104-9b34-51055352374c
   :END:
  Better use mapping of magic-strings to language is OK, but is
  evaluated once per file and only for the first 1000 bytes.

  #+begin_src sh :results org :tangle runscc.sh :shebang "#!/bin/bash" :noweb yes :exports both
  scc -i org --remap-all "src sql:sql,src go:go,src sh:shell" | \
             <<organise-output>>
  #+end_src

  #+RESULTS:
  #+begin_src org
  |---------------------------------------------------------------------------------|
  | Language                 Files     Lines   Blanks  Comments     Code Complexity |
  |---------------------------------------------------------------------------------|
  | Org                          1       190       21         0      169          8 |
  |---------------------------------------------------------------------------------|
  | Total                        1       190       21         0      169          8 |
  |---------------------------------------------------------------------------------|
  | Estimated Cost to Develop (organic) $4,177                                      |
  | Estimated Schedule Effort (organic) 1.715328 months                             |
  | Estimated People Required (organic) 0.216344                                    |
  |---------------------------------------------------------------------------------|
  | Processed 8949 bytes, 0.009 megabytes (SI)                                      |
  |---------------------------------------------------------------------------------|
  #+end_src

  The above table shows that the last map wins.

** expected output =scc -i org --org-babel=
  I want to see something similar to:

  #+begin_src org
  |---------------------------------------------------------------------------------|
  | Language                 Files     Lines   Blanks  Comments     Code Complexity |
  |---------------------------------------------------------------------------------|
  | Go                           1         6        2         0        4          0 |
  | SQL                          1         1        0         0        1          0 |
  | Shell                        1         6        2         1        6          0 |
  | Org                          1       170       15       155        0          8 |
  |---------------------------------------------------------------------------------|
  | Total                        3       183       19       156       11          8 |
  |---------------------------------------------------------------------------------|
  | Estimated Cost to Develop (organic) $1000                                       |
  | Estimated Schedule Effort (organic) 0.576681 months                             |
  | Estimated People Required (organic) 0.036537                                    |
  |---------------------------------------------------------------------------------|
  | Processed 381 bytes, 0.000 megabytes (SI)                                       |
  |---------------------------------------------------------------------------------|
  #+end_src
** shell-macro's
  #+name: organise-output
  #+begin_src sh :results org :tangle no :noweb yes
  sed 's/─/-/g' | \
    sed 's/^/|/' | \
    sed '$d'
  #+end_src

** some snippets
   :PROPERTIES:
   :ID:       9b3d81d4-73e2-46b2-bbb2-687e667df766
   :END:
*** first a golang one
  #+begin_src go :tangle hello.go :main no :noweb yes
  <<prelude>>
  func main() {
    fmt.Printf( "Hello world\n" )
  }
  #+end_src

  #+RESULTS:
  : Hello world

  #+name: prelude
  #+begin_src go :tangle no
  package main
  import (
      "fmt"
      )
  #+end_src

*** an SQL snippet, probably used in some go-code somewhere
    :PROPERTIES:
    :ID:       89a84292-f72e-4cd0-936a-a5d69feae630
    :END:
  When using =<<= sql-hello =>>= in code, it gets replaced by the
  sql-code to echo "hello world".
  #+name: sql-hello
  #+header: :engine postgresql :cmdline -h localhost -d postgres
  #+begin_src sql :tangle hello.sql :main no :noweb yes
  select 'hello world' as message
  #+end_src

  #+RESULTS:
  | message     |
  |-------------|
  | hello world |