Open jhelberg opened 3 years ago
So this is possible now if I am reading this right.
scc --remap-all
scc --remap-unknown
Which should inspect the start of the file and do this for you. I don't put this on by default generally because of the performance implementation.
However I suspect in this case they have the same extension?
If you can provide a sample of where this happens, and what you are trying to achieve that would really help.
I'll file an example org-file below, explaining the issue in depth. --remap-all works for one language per file only. Literate programming in org-mode uses a few (less than 5) org-files for generating tons of source code in different languages.
In org-mode code snippets start with #+begin_src
#+LATEX_HEADER: \DeclareUnicodeCharacter{2500}{–}
* =scc= counting lines in org-files with org-babel
:PROPERTIES:
:ID: 0465d03e-3ab0-4734-afff-257b583274d6
:END:
Maybe I'm not representing the mainstream developer, but I stick to
just a few org-files per project which generate (a process called
tangle, see Knuth: Literate Programming, org-babel is an
implementation of this for org-files) a few to hundres of source
files. The org-file is documentation and code in one (creating
documentation is called weave). As most projects use various
languages, the org-file contains shell-scripts, go-code, dot-graphs,
sql and more, much more. Generated code is usually comment-free,
those words are in the documentation surrounding the code.
As the org-file contains code and documentation, it's good enough to
qualify every part of the file and to count using the parts-contents
and it's language qualification. As there is no documentation-column
in scc, these lines (which is everything not qualified as a
language) may end up in the comment-column or code, whichever is
better.
Some feature where scc processes the org-file while understanding
the =#+= =begin_src= directive for setting the language for this
block would be a great improvement.
A few steps in scc-ing this org-file are explained below.
* the default case
:PROPERTIES:
:ID: 3701d407-7076-4597-9cf7-1084a1ead184
:END:
To process this file, one uses:
#+begin_src sh :results org :tangle runscc.sh :shebang "#!/bin/bash" :noweb yes :exports both
scc -i org | \
<<organise-output>>
#+end_src
#+RESULTS:
#+begin_src org
|---------------------------------------------------------------------------------|
| Language Files Lines Blanks Comments Code Complexity |
|---------------------------------------------------------------------------------|
| Org 1 187 20 0 167 8 |
|---------------------------------------------------------------------------------|
| Total 1 187 20 0 167 8 |
|---------------------------------------------------------------------------------|
| Estimated Cost to Develop (organic) $4,125 |
| Estimated Schedule Effort (organic) 1.707199 months |
| Estimated People Required (organic) 0.214674 |
|---------------------------------------------------------------------------------|
| Processed 8848 bytes, 0.009 megabytes (SI) |
|---------------------------------------------------------------------------------|
#+end_src
The above is OK, but doesn't qualify the code-snippets in this
document as belonging to a particular language and misses out on
reporting, accuracy and completeness.
Note that the org-entry in =languages.json= uses sections-markers
(=*=, ...) up to depth 4 for complexity.
** count in .
:PROPERTIES:
:ID: 012bebe4-7e4b-4799-a185-4f82cbd56a92
:END:
One can also pick up generated source as well:
#+begin_src sh :results org :tangle runscc.sh :shebang "#!/bin/bash" :noweb yes :exports both
scc | \
<<organise-output>>
#+end_src
#+RESULTS:
#+begin_src org
|---------------------------------------------------------------------------------|
| Language Files Lines Blanks Comments Code Complexity |
|---------------------------------------------------------------------------------|
| Go 1 6 2 0 4 0 |
| LaTeX 1 105 11 2 92 0 |
| Org 1 190 21 0 169 8 |
| SQL 1 1 0 0 1 0 |
| Shell 1 15 2 1 12 0 |
|---------------------------------------------------------------------------------|
| Total 5 317 36 3 278 8 |
|---------------------------------------------------------------------------------|
| Estimated Cost to Develop (organic) $7,044 |
| Estimated Schedule Effort (organic) 2.092156 months |
| Estimated People Required (organic) 0.299133 |
|---------------------------------------------------------------------------------|
| Processed 11561 bytes, 0.012 megabytes (SI) |
|---------------------------------------------------------------------------------|
#+end_src
But that is wrong, as it counts both text and code lines in the
org-file as code in addition to the code in the generated
code-files. All code is counted twice, documentation once. Also,
using no-web is close to calling functions; the shell-script
contains 15 lines, but that's actually just 6 lines which expand to
15 using the organise-output macro.
Also, but this is a minor issue, the temporary latex file is counted
in.
** count using =--remap-all=
:PROPERTIES:
:ID: 72f45efa-2297-4104-9b34-51055352374c
:END:
Better use mapping of magic-strings to language is OK, but is
evaluated once per file and only for the first 1000 bytes.
#+begin_src sh :results org :tangle runscc.sh :shebang "#!/bin/bash" :noweb yes :exports both
scc -i org --remap-all "src sql:sql,src go:go,src sh:shell" | \
<<organise-output>>
#+end_src
#+RESULTS:
#+begin_src org
|---------------------------------------------------------------------------------|
| Language Files Lines Blanks Comments Code Complexity |
|---------------------------------------------------------------------------------|
| Org 1 190 21 0 169 8 |
|---------------------------------------------------------------------------------|
| Total 1 190 21 0 169 8 |
|---------------------------------------------------------------------------------|
| Estimated Cost to Develop (organic) $4,177 |
| Estimated Schedule Effort (organic) 1.715328 months |
| Estimated People Required (organic) 0.216344 |
|---------------------------------------------------------------------------------|
| Processed 8949 bytes, 0.009 megabytes (SI) |
|---------------------------------------------------------------------------------|
#+end_src
The above table shows that the last map wins.
** expected output =scc -i org --org-babel=
I want to see something similar to:
#+begin_src org
|---------------------------------------------------------------------------------|
| Language Files Lines Blanks Comments Code Complexity |
|---------------------------------------------------------------------------------|
| Go 1 6 2 0 4 0 |
| SQL 1 1 0 0 1 0 |
| Shell 1 6 2 1 6 0 |
| Org 1 170 15 155 0 8 |
|---------------------------------------------------------------------------------|
| Total 3 183 19 156 11 8 |
|---------------------------------------------------------------------------------|
| Estimated Cost to Develop (organic) $1000 |
| Estimated Schedule Effort (organic) 0.576681 months |
| Estimated People Required (organic) 0.036537 |
|---------------------------------------------------------------------------------|
| Processed 381 bytes, 0.000 megabytes (SI) |
|---------------------------------------------------------------------------------|
#+end_src
** shell-macro's
#+name: organise-output
#+begin_src sh :results org :tangle no :noweb yes
sed 's/─/-/g' | \
sed 's/^/|/' | \
sed '$d'
#+end_src
** some snippets
:PROPERTIES:
:ID: 9b3d81d4-73e2-46b2-bbb2-687e667df766
:END:
*** first a golang one
#+begin_src go :tangle hello.go :main no :noweb yes
<<prelude>>
func main() {
fmt.Printf( "Hello world\n" )
}
#+end_src
#+RESULTS:
: Hello world
#+name: prelude
#+begin_src go :tangle no
package main
import (
"fmt"
)
#+end_src
*** an SQL snippet, probably used in some go-code somewhere
:PROPERTIES:
:ID: 89a84292-f72e-4cd0-936a-a5d69feae630
:END:
When using =<<= sql-hello =>>= in code, it gets replaced by the
sql-code to echo "hello world".
#+name: sql-hello
#+header: :engine postgresql :cmdline -h localhost -d postgres
#+begin_src sql :tangle hello.sql :main no :noweb yes
select 'hello world' as message
#+end_src
#+RESULTS:
| message |
|-------------|
| hello world |
Currently scc counts projects where all source-files come from one or more tangled org-mode files double: once as org-source and once as go/sh/whatever files.
Of course it is possible to run scc on the .org files only, but then all language-specific parsing and qualification is dropped.
Would it be possible to implement parsing org-files where #+begin_src go and #+begin_src c are recognized as the start of a go, resp. c code-block, or any other language? Scc may even recognize the :tangle directive and remove the mentioned generated source files from it's inspection targets.
I'm happy to help of course.