AdaCore / Ada-SPARK-Crate-Of-The-Year

19 stars 2 forks source link

[2021][septum] Search Tool for Large Codebases #7

Closed pyjarrett closed 2 years ago

pyjarrett commented 2 years ago

septum

badge

Finding what you need in large codebases is hard. Sometimes terms have multiple meanings in different parts of the project, and figuring out what you're looking for often needs to be done in an incremental fashion.

Septum provides an interactive environment to push and pop search filters to narrow or expand a search.

Purpose

Septum is like grep, but searches and returns matching contexts of contiguous lines, rather than just single lines or a multi-line search mode.

Context Matching

Limiting the search into blocks around search terms allows searching for elements in arbitrary order which may span across lines, in a way which can be difficult to express in other tools. Sometimes terms appear multiple times in a project and have names which change based on context. Septum allows exclusion of these contexts as well.

Context Matching - Copy

Design

Goals

Septum is designed to be a standalone application for the lone developer on their own hardware, searching closed source software. This means the program should use a minimum number of dependencies to simplify security auditing and perform no network operations.

Search

The disk forms a large bottleneck for search times for tools. Loading source code once into memory and perform many searches in the same interactive terminal improves speed at the risk of stale results. While an OS (or the disk itself) may have file caching, keeping the text in memory forces this data to stay available. Anecdotal evidence shows this requires approximately 1 GiB of memory for every ten million lines of code.

Though search results eventually get merged, all files can be processed independently. With one search tasks pinned per available core, each task picks the next available file to search off a queue and then finds appropriate contexts based on the first filter. After using each subsequent one to refine the search results, the list of matching contexts is merged into a common list of results.

Interactivity

VT100 and character driven input provide formatting feedback and tab completion through the trendy_terminal library, and progress indication through the progress_indicators library. Tab completion for commands and file names is provided, as well as hinting to show when commands can be auto-completed. Additionally, find-regex and exclude-regex use appropriate green/red highlighting to indicate when a regular expression is valid.

Dependencies (and their authors)

Alire was leveraged to split off possibly useful behavior to additional libraries (dir_iterators, progress_indicators, trendy_terminal, and trendy_test).

Usage

Septum has been in routine use to search Ada, C and C++ codebases with tens of millions of lines of code for several months.

The arrows show the commands and their effect on a running Septum instance:

septum

asciicast

Supported Platforms

Authors

Paul Jarrett

License

Apache License 2.0