fkie-cad / FACT_core

Firmware Analysis and Comparison Tool
https://fkie-cad.github.io/FACT_core
GNU General Public License v3.0
1.24k stars 224 forks source link

Properly document plugin results & Some more proposals for refactoring plugins #799

Open maringuu opened 2 years ago

maringuu commented 2 years ago

Plugins set their result in the FileObject.processed_analysis[PLUGIN_NAME] Currently its contents are documented as follows:

        #: Analysis results for this file.
        #:
        #: Structure of results:
        #: The first level of this dict is a pair of ``'plugin_name': <result_dict>`` pairs.
        #: The result dict can have any content, but always has at least the fields:
        #:
        #: * analysis_date - float representing the time of analysis in unix time.
        #: * plugin_version - str defining the version of each plugin at time of analysis.
        #: * summary - list holding a summary of each file's result, that can be aggregated.

If I am not mistaken there are some plugins that dont adhere to this documentation. This should be verified and fixed.

As discussed with @jstucke we will change it so that a plugins writes its result to FileObject.processed_analysis[PLUGIN_NAME]["result"]. The reasoning behind this is that all keys of FileObject.processed_analysis[PLUGIN_NAME] should be the same for all plugins. In addition a plugin should document what keys it sets in result.

maringuu commented 2 years ago

Also: Many plugins plugins that depend on other plugins use their summary to parse the results. Imo summary should not be something that should be parsed but rather just a human readable string. My idea would be to properly document and define the result and document that only this should be used by dependencies. Summary can then be changed without breaking anything.

maringuu commented 2 years ago

Recap

A quick recap of the discussion that we (@jstucke @dorpvom @rhelmke ) had.

Other ideas

Versioning

  • Enforcing a proper structure of the results will probably be done with json schema

With a properly defined schema we could use SemVer for plugin versions.

Plugin code structure

Currently the plugin code structure is like the following:

.
├── __init__.py
├── code
│   ├── __init__.py
│   └── qemu_exec.py
├── docker (Everything related to docker)
│   ├── Dockerfile
│   └── start_binary.py
├── install.py
├── routes (Some more templates and flask routes)
│   ├── __init__.py
│   ├── ajax_view.html
│   └── routes.py
├── test (Directory that contains all tests and test data)
│   ├── __init__.py
│   ├── data
│   ├── test_plugin_qemu_exec.py
│   └── test_routes.py
├── view
│   └── qemu_exec.html (Template used to display the results)
├── internal (Code that is not in the code directory)
│   └── some_code.py
│   └── some_module
│               └── nested.py

The name of the directory is something related to the plugin name as defined by AnalysisPlugin.NAME. The proposal is to change it to std like this:

├── __init__.py (Contains some code that allows us to do import plugins.analysis.qemu_exec.Plugin)
├── docker (unchanged)
│   ├── Dockerfile
│   └── start_binary.py
├── installer.py (Renamed)
├── main.py (Contains the PluginClass, which must not anymore be named AnalysisPlugin but rather sth like QemuExecPlugin)
├── routes.py
├── some_code.py
├── some_module (Now custom modules must not anymore be put inside an internal directory)
│   └── nested.py
├── templates (
└── test (Unchanged)
    ├── __init__.py
    ├── data
    └── test_plugin_qemu_exec.py

The biggest change is removing the code and internal directorys. They are currently only needed for plugin discovery. All files in the code directory are scanned for a class AnalysisPlugin and then this class is imported. This is not only bad because the classes dont have descriptive names but also unnecessarily complicated.

Also when we are doing this we could finally name the directory like the actual name of the plugin.