CybercentreCanada / assemblyline

AssemblyLine 4: File triage and malware analysis
https://cybercentrecanada.github.io/assemblyline4_docs/
MIT License
213 stars 14 forks source link

Full-text search in submission files #208

Open kam193 opened 3 months ago

kam193 commented 3 months ago

Is your feature request related to a problem? Please describe. When analyzing a submission consisting of multiple linked files (e.g. a python package), sometimes it's really useful to search for a string or pattern in other files. Now, you have to know where you want to search, and open each file in the submission separately.

Describe the solution you'd like I'm thinking about a simple, client-side search for text files - without making the server indexing all files. I believe it's fine if the UI downloads the content of each file (as it would do in the file view - including the same size limits) and search through it. The number of files to search through also should be limited, so we don't kill the web browser.

Describe alternatives you've considered

  1. Server-side search - nice for the client, but without indexing, most likely a way to kill the server.
  2. A little more advanced search - like filtering which files should be searched using name patterns or file types, selecting files to search, regex support, binary / hex search etc.
  3. Caching locally - this is more a thought, that after first search we could cache files (encrypted?) on a client for a while, to prevent re-downloading. This should, however, depend on the TLP, e.g. excluding TLP:RED from being cached.
  4. Kind of on-request service, which would download the submission file and search for the requested pattern - this would be relatively complex and spam the system with new submissions (if implemented using currently existing ways).

Additional context I don't think about any high-performance search feature, but more about something to get the first view and understand relations.

cccs-rs commented 3 months ago

Sounds like something the Retrohunt feature could be suitable for if we can narrow the context down to a certain submission rather than against all files that exist in the system?

In this way, you can devise a YARA rule (which I understand to be pretty extensive for searching) to target the data you want to look for within a file (hopefully with a filter query on AL's side to only apply the rule against files from a certain submission)

kam193 commented 2 months ago

It sounds nice, but I'd then focus on making it simply accessible: the idea is to quickly search in a submission, so if I needed to use API, manually look for or filter for the submission in another tab, or manually write a YARA rule, it would be rather less useful. But at the same time, if the implementation is like this, but I have to just click in the submission view and e.g. select to paste a yara rule for more advanced feature, or let UI prepare a simple rule (search for a given string) for me - it sounds very nice