facebook / pyre-check

Performant type-checking for python.
https://pyre-check.org/
MIT License
6.8k stars 434 forks source link

Pyre / Pysa unable to analyse files with same name in different folders #823

Closed giusepperaffa closed 4 months ago

giusepperaffa commented 5 months ago

Description I have been trying to use Pysa (Ubuntu 20.04 + virtual environment + Python 3.8) to perform the data flow analysis of a repository with three different folders (see diagram below). Each folder contains a handler.py source code file.

graph TD;
    Repo-->Folder_A;
    Repo-->Folder_B;
    Repo-->Folder_C;

As shown by the two experiments described below, both Pyre and Pysa analyse only one of the handler.py files. Folders appear to be sorted alphabetically first, and then only the first detected handler.py file is processed.

As a result, fewer than expected security vulnerabilities are detected, as two handler.py files are ignored.

Note: This issue seems to be loosely connected with #731.

Experiment 1 - Pyre automated type inference Prior to the actual execution of the data flow analysis, I first performed an automated type inference with Pyre by executing the following command:

Pyre was able to type-annotate only one of the handler.py files. To solve this issue, I had to rename all the source code files, so that all the names were different.

Experiment 2 - Pysa data flow analysis I attempted to perform the analysis of the original repository with the usual command:

Pysa was able to analyse (i.e., detect the expected vulnerability) only one of the handler.py files. To detect all the expected vulnerabilities, I had to rename all the source code files, so that all the names were different. Note: the different source code files contained different types of vulnerability.

Configuration The folders containing the handler.py files were all explicitly included in the configuration file .pyre_configuration. The relevant portion of this file is shown below:

{
    "source_directories": [
        "./Repo/Folder_A",
        "./Repo/Folder_B",
        "./Repo/Folder_C"
    ],
}

Conclusion Any help would be greatly appreciated. Perhaps this unexpected behaviour can be rectified by adding / changing a configuration option in .pyre_configuration or by executing Pyre / Pysa with a specific command-line option. Thank you very much.

ebrahimsofi123 commented 4 months ago

Hey! It sounds like you're facing an issue with Pyre and Pysa where only one of the handler.py files in your repository is being analyzed when you run your security checks. This limitation seems to be due to the tools' handling of files with identical names across different directories. Despite specifying the directories in your .pyre_configuration, the tools might still be reading and processing only the first handler.py they encounter, presumably due to how they manage file paths internally.

As you noted, renaming the files to have unique names across the folders resolves the issue by ensuring each file is recognized as distinct. However, this might not be the most scalable solution, especially in larger projects or environments where such renaming could disrupt existing workflows or systems.

A potential workaround without renaming could involve adjusting how Pyre and Pysa are configured or invoked, such as by specifying file paths more granularly or using a script to run the analysis separately on each directory and then aggregating the results. Alternatively, reaching out to the Pyre community or their support might yield a configuration change or update that addresses this behavior more directly. If this behavior is tied to a known issue (as suggested by the reference to #731), there might be ongoing work or upcoming fixes that could also resolve your problem.

arthaud commented 4 months ago

Closing this since the answer above provides workarounds for the problem.