fkie-cad / FACT_core

Firmware Analysis and Comparison Tool
https://fkie-cad.github.io/FACT_core
GNU General Public License v3.0
1.24k stars 225 forks source link

How to query for/find file from SHA1? #1272

Closed ankien closed 1 month ago

ankien commented 1 month ago

The FACT version you are using

No response

Your question

Hello,

I have the SHA1 values of some files (the value listed under 'sha1' in 'file hashes' analysis) I would like to find in my FACT database and I'm wondering what the correct usage of the Rest API functions to get those files would be.

I have tried using GET /rest/file_object with {"sha1":"a360673b2c7c9794d4d45e900485b4d32e6c9cf2"} as query:

curl -X 'GET' \ 'http://172.20.11.73:5000/rest/file_object?query=%7B%22sha1%22%3A%22a360673b2c7c9794d4d45e900485b4d32e6c9cf2%22%7D' \ -H 'accept: application/json'

But it seems to give me a bunch of seemingly unrelated file UIDs as a result:

  "request": {
    "limit": 0,
    "offset": 0,
    "query": {
      "sha1": "a360673b2c7c9794d4d45e900485b4d32e6c9cf2"
    }
  },
  "request_resource": "/rest/file_object",
  "status": 0,
  "timestamp": 1727294802,
  "uids": [
    "027905a7f435884bf3275cf29c0adcb8942664941641b7ff56264b3e28cd9a4c_24",
    "289dfeece6231ca555c452413bd6a6de7f71c5464ba8b1fca8723944e819b73b_112",
    "4fd955cfa334d9bca35a15a66d4ad2866f7625e461a00042b34c568609bc1abb_112",
    ...
    ]

Any advice on how to do this properly? If this is not possible, then I would like to at least understand where the numbers in the file UIDs after the SHA256 value are derived from.

Thank you.

jstucke commented 1 month ago

The reason that you get so many results is that file_object does not have a field "sha1" (so you get all files as result). It is stored as a result of the "file_hashes" analysis plugin. To query for results of this plugin you need to write it like this:

curl  -X GET 'http://localhost:5000/rest/file_object?limit=100' \
    -G --data-urlencode 'query={"processed_analysis.file_hashes.sha1": "a360673b2c7c9794d4d45e900485b4d32e6c9cf2"}'

So if you replace "sha1" with "processed_analysis.file_hashes.sha1" in your query it should work. Sorry for the unintuitive syntax. It is pretty much an artifact of our old MongoDB database. We are currently working on adding GraphQL support which should make it easier to write queries.

ankien commented 1 month ago

The reason that you get so many results is that file_object does not have a field "sha1" (so you get all files as result). It is stored as a result of the "file_hashes" analysis plugin. To query for results of this plugin you need to write it like this:

curl  -X GET 'http://localhost:5000/rest/file_object?limit=100' \
    -G --data-urlencode 'query={"processed_analysis.file_hashes.sha1": "a360673b2c7c9794d4d45e900485b4d32e6c9cf2"}'

So if you replace "sha1" with "processed_analysis.file_hashes.sha1" in your query it should work. Sorry for the unintuitive syntax. It is pretty much an artifact of our old MongoDB database. We are currently working on adding GraphQL support which should make it easier to write queries.

This worked, thank you so much for the support!