cthing / locc4j

A Java library for counting lines of source code.
Apache License 2.0
1 stars 0 forks source link

C Thing Software locc4j

CI Maven Central javadoc

A Java library for counting lines of source code. Supports over 250 languages using a high performance counting algorithm.

Features

Supported Languages

The complete list of supported languages can be found in the languages.json file. The file includes a description of each language, a link to detailed information about the language, the file extensions associated with the language, and additional information to describe the language syntax for counting purposes.

To programmatically obtain the list of supported languages and the file extensions to which they are associated, call Language.getExtensions.

To request support for a language, create an issue and provide the following information:

Usage

The library is available from Maven Central using the following Maven dependency:

<dependency>
  <groupId>org.cthing</groupId>
  <artifactId>locc4j</artifactId>
  <version>2.0.0</version>
</dependency>

or the following Gradle dependency:

implementation("org.cthing:locc4j:2.0.0")

Counter Results

All counting APIs return results using a map of language to line counts (i.e. Map<Language, Counts>). A map is used to accommodate languages that can embed other languages. The library detects these embedded languages and counts their lines using the counter corresponding to that language. For example, while an HTML file contains markup, it may also contain lines of JavaScript and CSS. This means that the counter results for an HTML file might contain counts for three languages (HTML, JavaScript and CSS).

The Counts class provides the actual line counts. The counts are:

Note that this library does not provide any support for serializing the line count results. The gradle-locc Gradle plugin uses the locc4j library and provides a good example of how the counts map can be serialized to various file formats (e.g. JSON, XML). The plugin uses the xmlwriter and jsonwriter libraries to serialize the results to those formats.

Counting a String or Character Array

The following code counts lines within a string or character array.

final Counter counter = new Counter(Language.Markdown);
final Map<Language, Counts> counts = counter.count("# Title\n\nHello World");

The returned counts map contains an entry for each detect language. In the above example, one language is detected (i.e. the language specified and no embedded languages), so the resulting map contains:

Language.Markdown:
    codeLines == 2
    commentLines == 0
    blankLines == 1

If the Markdown content contained embedded Mermaid diagram markup, the returned map would contain two keys, Language.Markdown and Language.Mermaid, each with their respective line counts.

Counting an Input Stream

The following code counts lines from a character input stream.

final InputStream ins = getClass().getResourceAsStream("/data/program.py");
final Counter counter = new Counter(Language.Python);
final Map<Language, Counts> counts = counter.count(ins);

Counting One or More Files

The following code counts lines from a single file. The file's primary language is determined by first examining its name, then extension, and finally any shebang (i.e. #!) that may be present at the start of the file.

final FileCounter counter = new FileCounter();
final Map<Path, Map<Language, Counts>> counts = counter.count("/tmp/program.cpp");

The following code counts lines from multiple files. Each file's primary language is determined by first examining its name, then extension, and finally any shebang (i.e. #!) that may be present at the start of the file.

final FileCounter counter = new FileCounter();
final Map<Path, Map<Language, Counts>> counts = counter.count("/tmp/program1.cpp", "/tmp/program2.java");

Counting Files in a Directory

The following code counts all files in the specified directory. By default, hidden files are excluded.

final CountingTreeWalker walker = new CountingTreeWalker(Path.of("/home/myusername/foo")).maxDepth(1);
final Map<Path, Map<Language, Counts>> counts = walker.count();

The following code counts only C++ source files in the specified directory using a glob pattern match.

final CountingTreeWalker walker = new CountingTreeWalker(Path.of("/home/myusername/foo"), "*.cpp").maxDepth(1);
final Map<Path, Map<Language, Counts>> counts = walker.count();

Counting a File System Tree

The following code counts all files under the specified directory tree. By default, hidden files and directories are excluded.

final CountingTreeWalker walker = new CountingTreeWalker(Path.of("/home/myusername/foo"));
final Map<Path, Map<Language, Counts>> counts = walker.count();

The following code excludes files based on any encountered Git ignore files.

final CountingTreeWalker walker = new CountingTreeWalker(Path.of("/home/myusername/foo")).respectGitignore(true);
final Map<Path, Map<Language, Counts>> counts = walker.count();

The following code counts only C++ source files using a glob pattern match.

final CountingTreeWalker walker = new CountingTreeWalker(Path.of("/home/myusername/foo"), "*.cpp");
final Map<Path, Map<Language, Counts>> counts = walker.count();

The following code does the same thing using a language match.

final CountingTreeWalker walker = new CountingTreeWalker(Path.of("/home/myusername/foo"), Language.Java);
final Map<Path, Map<Language, Counts>> counts = walker.count();

See the Javadoc for the CountingTreeWalker for details on the glob syntax and other options.

The CountUtils class provides methods to calculate various metrics based on the results of a file tree walk. For example, the following code calculates the line counts for all languages encountered on a tree walk.

final CountingTreeWalker walker = new CountingTreeWalker(Path.of("/home/myusername/foo"));
final Map<Path, Map<Language, Counts>> fileCounts = walker.count();
final Map<Language, Counts> languageCounts = CountUtils.byLanguage(fileCounts);

Finding a Language

The library's file-based APIs automatically determine the primary language of a file. The Language enum provides methods to manually determine a language.

Method Description
fromFile Determines a file's language by first looking up the file name, then the file extension, and finally any shebang.
fromMime Provides the language associated with a MIME type
fromFileExtension Provides the language associated with the specified file extension
fromId Provides the language associated with the specified Language enum value
fromName Provides the language associated with the specified language name
fromShebang Provides the language associated with the specified file's interpreter or environment shebang

Custom File Extension Associations

The library has a built-in association of common file extensions to languages. These associations can be augmented, changed or removed. Call Language.addExtension to add a new association or change an existing one. Call Language.removeExtension to remove an association. To restore the default associations, call Language.resetExtensions.

Accuracy

The library does not perform complete parsing of each language. As described in the Counting Performance document, this would severely impact performance and would be impractical to implement. The counting algorithm used by the library makes a balanced tradeoff between performance and accuracy. While the algorithm can accommodate nested comments and embedded languages, inevitably there are language constructs that will be incorrectly counted. Please report these inaccuracies by creating an issue and providing the following information:

Acknowledgements

The counting algorithm and the initial set of language data used by this library are based on the tokei project using the MIT License, which is reproduced below.

MIT License (MIT)

Copyright (c) 2016 Erin Power

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Building

The library is compiled for Java 17. If a Java 17 toolchain is not available, one will be downloaded.

Gradle is used to build the library:

./gradlew build

The Javadoc for the library can be generated by running:

./gradlew javadoc

A Gradle plugin in the languagePlugin directory is used to generate the Language enum class from the languages.json data file and the Language.ftl FreeMarker template.

Releasing

This project is released on the Maven Central repository. Perform the following steps to create a release.