AstuteSource / chasten

:dizzy: Chasten Uses XML and XPATH to Check a Python Program's AST for Specified Patterns!
https://pypi.org/project/chasten/
GNU General Public License v2.0
7 stars 8 forks source link
python python-ast static-analysis xml

<img src="https://github.com/AstuteSource/chasten/blob/master/.github/images/chasten-logo.svg" alt="Chasten Logo" title="Chasten Logo" />

๐Ÿ’ซ chasten

build Coverage Language:
Python Code Style: black Maintenance License LGPL v3

๐ŸŽ‰ Introduction

๐Ÿ˜‚ Definitions

๐Ÿ”‹Features

โšก๏ธ Requirements

๐Ÿ”ฝ Installation

Follow these steps to install the chasten program:

๐Ÿช‚ Configuration

You can configure chasten with two YAML files, normally called config.yml and checks.yml. Although chasten can generate a starting configuration, you can check out the ๐Ÿ“ฆ AstuteSource/chasten-configuration repository for example(s) of configuration files that setup the tool. Although the config.yml file can reference multiple check configuration files, this example shows how to specify a single checks.yml file:

# chasten configuration
chasten:
  # point to a single checks file
  checks-file:
    - checks.yml

The checks.yml file must contain one or more checks. What follows is an example of a check configuration file with two checks that respectively find the first executable line of non-test and test-case functions in a Python project. Note that the pattern attribute specifies the XPath version 2.0 expression that chasten will use to detect the specified type of Python function. You can type chasten configure validate --config <path to chasten-configuration/ directory | config url> after filling in <path to chasten-configuration/directory | config url> with the fully-qualified name of your configuration directory and the tool will confirm that your configuration meets the tool's specification. You can also use the command chasten configure create command to automatically generate a starting configuration! Typing chasten configure --help will explain how to configure the tool.

checks:
  - name: "all-non-test-function-definition"
    code: "FUNC"
    id: "FUNC001"
    description: "First executable line of a non-test function, skipping over docstrings and/or comments"
    pattern: '//FunctionDef[not(contains(@name, "test_"))]/body/Expr[value/Constant]/following-sibling::*[1] | //FunctionDef[not(contains(@name, "test_"))]/body[not(Expr/value/Constant)]/*[1]'
  - name: "all-test-function-definition"
    code: "FUNC"
    id: "FUNC002"
    description: "First executable line of a test function, skipping over docstrings and/or comments"
    pattern: '//FunctionDef[starts-with(@name, "test_")]/body/Expr[value/Constant]/following-sibling::*[1] | //AsyncFunctionDef[starts-with(@name, "test_")]/body/Expr[value/Constant]/following-sibling::*[1] | //FunctionDef[starts-with(@name, "test_")]/body[not(Expr/value/Constant)]/*[1] | //AsyncFunctionDef[starts-with(@name, "test_")]/body[not(Expr/value/Constant)]/*[1]'
    count:
      min: 1
      max: 10

โœจ Analysis

Since chasten needs a project with Python source code as the input to its analysis sub-command, you can clone the ๐Ÿ“ฆ AstuteSource/lazytracker and ๐Ÿ“ฆ AstuteSource/multicounter repositories that are forks of existing Python projects created for convenient analysis. To incrementally analyze these two projects with chasten, you can type the following commands to produce a results JSON file for each project:

chasten analyze lazytracker \
        --config <path to the chasten-configuration/ directory | config url> \
        --search-path <path to the lazytracker/ directory> \
        --save-directory <path to the subject-data/lazytracker/ directory> \
        --save
chasten analyze multicounter \
        --config <path to the chasten-configuration/ directory | config url> \
        --search-path <path to the multicounter/ directory> \
        --save-directory <path to the subject-data/lazytracker/ directory> \
        --save

๐Ÿšง Integration

After running chasten on the lazytracker and multicounter programs you can integrate their individual JSON files into a single JSON file, related CSV files, and a SQLite database. Once you have made an integrated-data/ directory, you can type this command to perform the integration:

chasten integrate all-programs \
        <path to subject-data>/**/*.json \
        --save-directory <path to the integrated-data/ directory>

This command will produce a directory like chasten-flattened-csvs-sqlite-db-all-programs-20230823171016-2061b524276b4299b04359ba30452923/ that contains a SQLite database called chasten.db and a csv/ directory with CSV files that correspond to each of the tables inside of the database.

You can learn more about the integrate sub-command by typing chasten integrate --help.

๐Ÿ’  Verbose Output

When utilizing the chasten command, appending this --verbose flag can significantly enhance your troubleshooting experience and provide a detailed understanding of the tool's functionality. Here is an example with chasten analyze lazytracker:

chasten analyze lazytracker \
        --config <path to the chasten-configuration/ directory> \
        --search-path <path to the lazytracker/ directory> \
        --save-directory <path to the subject-data/lazytracker/ directory> \
        --save
        --verbose

Upon executing this command, you can expect the output to contain informative messages such as โœจ Matching source code: indicating that the tool is actively comparing the source code against the specified patterns. Additionally, you will receive detailed match results, providing insights into the identified checks.

๐ŸŒ„ Results

If you want to create an interactive analysis dashboard that uses ๐Ÿ“ฆ simonw/datasette you can run chasten datasette-serve <path containing integrated results>/chasten.db --port 8001. Now you can use the dashboard in your web browser to analyze the results while you study the source code for these projects with your editor! Examining the results will reveal that chasten, through its use of ๐Ÿ“ฆ spookylukey/pyastgrep, correctly uses the XPath expression for all-test-function-definition to find the first line of executable source code inside of each test, skipping over a function's docstring and leading comments.

For the lazytracker program you will notice that chasten reports that there are 6 test cases even though pytest only finds and runs 5 tests. This is due to the fact that tests/test_tracked.py test suite in lazytracker contains a function starting with test_ inside of another function starting with test_. This example illustrates the limitations of static analysis with chasten! Even though the tool correctly detected all of the "test functions", the nesting of the functions in the test suite means that pytest will run the outer test_ function and use the inner test_ function for testing purposes.

With that said, chasten correctly finds each of the tests for the multicounter project. You can follow each of the previous steps in this document to apply chasten to your own Python program!

๐ŸŒŽ Deployment

If you want to make your chasten.db publicly available for everyone to study, you can use the chasten datasette-publish sub-command. As long as you have followed the installation instructions for ๐Ÿ“ฆ simonw/datasette-publish-fly and ๐Ÿ“ฆ simonw/datasette-publish-vercel, you can use the plugins to deploy a public datasette server that hosts your chasten.db. For instance, running the command chasten datasette-publish <path containing integrated results>/chasten.db --platform vercel will publish the results from running chasten on lazytracker and multicounter to the Vercel platform.

Importantly, the use of the chasten datasette-publish command with the --platform vercel option requires you to have previously followed the instructions for the datasette-publish-vercel plugin to install the vercel command-line tool. This is necessary because, although datasette-publish-vercel is one of chasten's dependencies neither chasten nor datasette-publish-vercel provide the vercel tool even though they use it. You must take similar steps before publishing your database to Fly!

๐Ÿคฏ Interaction

Even though chasten is a command-line application, you create interactively create the tool's command-line arguments and options through a terminal user interface (TUI). To use TUI-based way to create a complete command-line for chasten you can type the command chasten interact.

๐Ÿ“ŠLog

Chasten has a built-in system log. While using chasten you can use the command chasten log in your terminal. The system log feature allows the user to see events and messages that are produced by chasten. In addition, the chasten log feature will assist in finding bugs and the events that led to the bug happening. For the chasten program to display to the system log you will have to open a separate terminal and use the command chasten log. In addition for each command that is run the --debug-level <choice of level> and --debug-dest SYSLOG will need to be added.

For example, chasten datasette-serve --debug-level DEBUG --debug-dest SYSLOG < database path to file> will produce the following output in the system log.

๐Ÿ’ซ chasten: Analyze the AST of Python Source Code
๐Ÿ”— GitHub: https://github.com/gkapfham/chasten
โœจ Syslog server for receiving debugging information

Display verbose output? False
Debug level? DEBUG
Debug destination? SYSLOG

In each command in chasten, there is an option to add a --debug-level. The debug level has 5 options debug, info, warning, error, and critical. Each level will show different issues in the system log where debug is the lowest level of issue from the input where critical is the highest level of error. To leverage more info on this you can reference debug.py file:

class DebugLevel(str, Enum):
    """The predefined levels for debugging."""

    DEBUG = "DEBUG"
    INFO = "INFO"
    WARNING = "WARNING"
    ERROR = "ERROR"
    CRITICAL = "CRITICAL"

โœจ chasten --help

 Usage: chasten [OPTIONS] COMMAND [ARGS]...                                                    

โ•ญโ”€ Options โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ --install-completion          Install completion for the current shell.                     โ”‚
โ”‚ --show-completion             Show completion for the current shell, to copy it or          โ”‚
โ”‚                               customize the installation.                                   โ”‚
โ”‚ --help                        Show this message and exit.                                   โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
โ•ญโ”€ Commands โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ analyze                      ๐Ÿ’ซ Analyze the AST of Python source code.                      โ”‚
โ”‚ configure                    ๐Ÿช‚ Manage chasten's configuration.                             โ”‚
โ”‚ datasette-publish            ๐ŸŒŽ Publish a datasette to Fly or Vercel.                       โ”‚
โ”‚ datasette-serve              ๐Ÿƒ Start a local datasette server.                             โ”‚
โ”‚ integrate                    ๐Ÿšง Integrate files and make a database.                        โ”‚
โ”‚ interact                     ๐Ÿš€ Interactively configure and run.                            โ”‚
โ”‚ log                          ๐Ÿฆš Start the logging server.                                   โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

๐Ÿง‘โ€๐Ÿ’ป Development Enviroment

๐Ÿ  Local

Follow these steps to install the chasten tool for future development:

Once Python and Poetry is installed, please go to the Chasten repository on github and install the tool using the git clone command in your terminal. Then navigate to the Chasten directory and run the command poetry install to install all the dependencies.

๐Ÿ‹ Docker

There is also the option to use Docker to use chasten

Follow these steps to utilize Docker:

๐Ÿ“‹ Development Tasks

๐Ÿค— Learning

๐Ÿค“ Chasten vs. Symbex

Chasten and Symbex, which was created by Simon Willison, are both tools designed for analyzing Python source code, particularly focusing on searching for functions and classes within files. While they share a common goal, there are notable differences between the two, especially in terms of their command-line interfaces and functionality.

In terms of Command-Line Interface, Symbex employs a concise CLI, utilizing abbreviations for various options. For instance, the command to search for function signatures in a file named test_debug.py is as follows:

command :symbex -s -f symbex/test_debug.py
    def test_debug_level_values():
    def test_debug_level_isinstance():
    def test_debug_level_iteration():
    def test_debug_destination_values():
    def test_debug_destination_isinstance():
    def test_debug_destination_iteration():
    def test_level_destination_invalid():
    def test_debug_destination_invalid():

Chasten, on the other hand, leverages Python packages such as Typer and Rich to provide a user-friendly and feature-rich command-line interface. The available commands for Chasten include:

In terms of functionality, Symbex is designed to search Python code for functions and classes by name or wildcard. It provides the ability to filter results based on various criteria, including function type (async or non-async), documentation presence, visibility, and type annotations.

On the other hand, Chasten's analyze command performs AST analysis on Python source code. It allows users to specify a project name, XPATH version, search path, and various filtering criteria. Chasten supports checks for inclusion and exclusion based on attributes, values, and match confidence levels. The tool also provides extensive configuration options and the ability to save results in different formats, including markdown.

In summary, while both Chasten and Symbex serve the common purpose of analyzing Python source code, Chasten offers a more versatile and user-friendly CLI with additional features of configuration and result management. Symbex, on the other hand, adopts a concise CLI with a focus on searching and filtering functionalities. The choice between the two tools depends on the user's preferences and specific requirements for Python code analysis.

๐Ÿ“ฆ Similar Tools

In addition to Chasten and Symbex, several other tools offer unique capabilities for analyzing and searching through Python source code, each catering to specific use cases.

๐Ÿง—Improvement