Questions about REST - Githubissues

KauSaal commented 1 year ago

The FACT version you are using

1b67926aa2f9b06b2592b9f685e47f115dff3341

Your question

Hello, I am writing my Bachelor Thesis at the moment and my task is to develop an system which scrapes firmware images from vendor websites and uploads them to a database to catalogize them. I decided to use FACT because that way I don't need to build a firmware catalog from scratch and get analysis tools in one package too. The scraping itself is working but when I do a lot of PUT requests via REST, the firmwares will not get uploaded properly although I get a success response from REST. I assume this due to an overload in FACT. My Questions are: -Is there a feasible way to check if a Put request is finished? -Is there a way I can postpone the unpacking until all firmware images are uploaded properly? -Are there papers or similar about the architecture of FACT since I can't get much info from the docs here on github?

Thanks in advance

dorpvom commented 1 year ago

Hi @KauSaal,

we're happy to supply you with some tips for your work.

The first two points are connected: The put request should be finished once it returns a response. The first log messages might take a bit, since unpacking the outermost container can take a few minutes, but it will already work in the background. You can then use the /rest/status endpoint to check if a specific or any analysis is currently running. This way you can wait with the next PUT until the previous analysis is finished. You can try out the REST endpoint with our Swagger UI integration or you can wait until next week (I'm out of office until then) and I can provide a snippet that does exactly that since I do similar stuff regularly.

Regarding the third point: We have never published a paper directly based on FACT, though there are some presentation slides from technical conferences. I'm not sure though, if they provide more information than those already present in our GitHub Wiki or the github.io page which you probably already both found.

dorpvom commented 1 year ago

import json
import time
from urllib.parse import quote

import requests

TARGET_HOST = 'http://your.host:port'
UPLOAD_TIMEOUT = 30  # Wait after upload before starting to poll if finished
POLL_TIMEOUT = 30
KEY = ''  # Leave the api key empty if not needed
PLUGINS = {
    'cpu_architecture', 'crypto_material', 'cve_lookup', 'device_tree', 'elf_analysis', 'exploit_mitigations',
    'hardware_analysis', 'information_leaks', 'interesting_uris', 'ip_and_uri_finder', 'kernel_config', 'known_vulnerabilities',
    'printable_strings', 'software_components', 'users_and_passwords'
} # change based on your needs. This is my default set.
META_PATH = '/path/to/your/firmware/meta/data'

def create_url(web_path):
    return f'{TARGET_HOST}{web_path}'

def check_progress(uid: str) -> bool:
    # Check if currently running analysis is finished
    response = requests.get(f'{TARGET_HOST}/rest/status')
    status = response.json()
    recently_finished = status["system_status"]["backend"]["analysis"]["recently_finished_analyses"]
    current_analyses = status["system_status"]["backend"]["analysis"]["current_analyses"]
    return uid in recently_finished or current_analyses == {}

def upload_firmware(upload_dict: dict) -> str:
    response = requests.put(f'{TARGET_HOST}/rest/firmware', json=upload_dict).json()
    try:
        return response['uid']
    except KeyError:
        print(f'[ERROR]\t{response}')
        exit(1)

def read_meta_data() -> list[tuple[str, dict]]:  # your code goes here
    # For each firmware you need some basic meta information before uploading. Most can be dummy values, but I'd give as much as possible for better reference later on.
    # I would put this in some json file that you can read here
    # The code in this function converts your meta data to a dict that can be used for FACT REST upload
    # ...Of course you can also directly store your meta data in FACT upload format to skip this transformation
    meta_data = json.loads(Path(META_PATH).read_text())
    fact_upload_data = []

    for firmware in meta_data:
        binary = Path(firmware['local_path']).read_bytes()  # Path to the firmware binary on your filesystem
        b64_binary = b64encode(binary).decode()  # FACT wants the binary in base64 (as string, not bytes) due to json not accepting bytes. I think that's the function you want to do this. import has to be added.
        file_sha = sha256sum(binary)  # Best you calculate this here for later use and logging. I'm blanking on the exact library function to use, but you'll find something.
        fact_upload_data.append(
            (
                f'{file_sha}_{len(binary)}',  # That's your FACT UID
                {
                    'binary': b64_binary,
                    'device_class': firmware['device_class'],
                    'device_name': firmware['device_name'],
                    'device_part': firmware['device_part'],
                    'file_name': firmware['file_name'],
                    'release_date': firmware['release_date'],
                    'requested_analysis_systems': PLUGINS,
                    'tags': '',
                    'vendor': firmware['vendor'],
                    'version': firmware['version']
                }
            )
    return fact_upload_data

def analysis_already_done(uid: str) -> bool:
    return requests.get(create_url(f'/rest/firmware/{uid}')).status_code == 200

def main():
    counter = 1
    try:
        file_id = None
        firmware = read_meta_data () # This you have to code yourself
        print(f'Transfering {len(firmware)} firmware images')
        for file_id, meta in firmware:
            if analysis_already_done(file_id):  # handy if you cancel the script and continue later (or if some error occurs)
                print(f'[{counter}/{len(firmware)}]\t Skipped analysis of {file_id}. Already present.')
                counter += 1
                continue
            uid = upload_firmware(meta)
            print(f'[{counter}/{len(firmware)}]\t Started analysis of {uid}')
            counter += 1
            time.sleep(UPLOAD_TIMEOUT)
            while not check_progress(uid):
                time.sleep(POLL_TIMEOUT)
    except requests.exceptions.ConnectionError:
        print('Connection failed. Is host up?')
    except json.decoder.JSONDecodeError:
        print(f'Bad response from host, check for authentication and proxy.\nid={file_id}')
    return 0

if __name__ == '__main__':
    exit(main())

dorpvom commented 1 year ago

I've not tested this, but adapted another script to roughly your purpose. Note the comments for additional explanation and for missing imports (I did not look for some library calls that were not imported yet).

KauSaal commented 1 year ago

Hi @dorpvom Thanks for the snippets. I adapted these and am using "ready-to-upload" JSON Objects which are created in my webscraper implementation. I have a few more Questions: 1) When I upload a lot of firmware files in one go, my FACT Instance is either crashing after a while or the extractor containers are killing processes via the Out-of-Memory Killer and then FACT just hangs with a lot of pending unpackings but won't continue 2) About the pending unpackings: How can it be that I have e.g. 137895 pending items for extractions when the current Firmware only contains 2932 Files (based on the overview "Currently analyzed firmware" on the system_health page 3) Is it normal that Firmwareunpackings are taking very long, I have a ~15MB Firmwarefile and the unpacking process is running for ~1h and 15 mins with no end in sight? Thanks in advance for your help

jstucke commented 1 year ago

Hi @KauSaal, 1) you should make sure that the next firmware is only uploaded/posted when the analysis of the previous firmware is finished (the script that @dorpvom posted checks this in check_progress()). Otherwise you may run out of RAM (and it certainly sounds like you are running out of RAM so you should maybe keep an eye on that). If you run out of RAM, FACT will crash. There are some things you can do to lower the RAM usage like reduce the number of plugin and unpacking workers in the configuration. 2) it sounds like you uploaded multiple firmware images at once. If not, I'm not sure what is going on. 3) It sound like maybe the scheduling process crashed? It really shouldn't take that long for a ~15MB file. Is there any progress visible on the system page?

fkie-cad / FACT_core

Questions about REST #1040

The FACT version you are using

Your question