CybercentreCanada / assemblyline

AssemblyLine 4: File triage and malware analysis
https://cybercentrecanada.github.io/assemblyline4_docs/
MIT License
211 stars 14 forks source link

Feature Request: Scheduled Jobs within Assemblyline UI #91

Open ociappara opened 11 months ago

ociappara commented 11 months ago

Background

Our team is increasingly experiencing use cases that demand the ability to schedule containers to run at specific intervals. These could serve various purposes such as automated data ingestion, scheduled results processing, and automated reporting. Currently, the management of additional infrastructure for this process proves time-consuming and exposes Assemblyline (AL) to various systems, complicating oversight.

Proposed Feature

We propose the addition of a new feature in AL that allows us to schedule containers to run at specific intervals, leveraging the existing infrastructure. This feature would be analogous to the process of adding services, but under a dedicated tab specifically for scheduled tasks.

These tasks should have direct access to the submission results, hence streamlining the process by eliminating certain API calls.

New Component: Jobs Tab

The new 'Jobs Tab' in the AL UI will expose the current Jobs process. Just as services are added now in the Services Tab, we can add scheduled jobs, provide a manifest with our ACR instance, credentials, and schedule. In the background, AL will create the Kubernetes Jobs.

Specialized Jobs and Access Management

Some jobs should be tailored to function like the current YARA and configExtractor services, with an updater component that fetches from remote repositories like GitHub and executes scripts from there.

We also foresee scenarios where jobs need restricted data access (for instance, based on TLP restrictions). Therefore, we should be able to set access rules for each job.

Benefits

The key benefits of this feature include:

cccs-jp commented 11 months ago

In CCCS we use Apache Airflow for scheduling automation with Assemblyline and other system, is that an option for you?

https://airflow.apache.org

ociappara commented 11 months ago

Thanks @cccs-jp Airflow is certainly an option and we'll be looking into it. This idea comes from the notion that, at least from an academic sense, you never really "are done" with an analysis, you just reached the limits of your ability to add value at the moment. For instance, one use case would be to; take all domains and mine periodically for whois data or reverse ip, etc, another could be resubmitting a sample based on certain indicators.

I understand the complexity of this ask, and please treat it as a nice to have, we'll be looking at how we can leverage Airflow.

jxb5151 commented 11 months ago

Thanks @cccs-jp Airflow is certainly an option and we'll be looking into it. This idea comes from the notion that, at least from an academic sense, you never really "are done" with an analysis, you just reached the limits of your ability to add value at the moment. For instance, one use case would be to; take all domains and mine periodically for whois data or reverse ip, etc..

I understand the complexity of this ask, and please treat it as a nice to have, we'll be looking at how we can leverage Airflow.

Also just adding here, much of this conceptually can be mapped back to what is known as a 'Blackboard System'.

cccs-rs commented 1 month ago

Status on issue? Is this still something we're interested in pursuing?

ociappara commented 1 month ago

We would still love to have the ability to do this through the UI, it will enable analysts to run their scripts directly, I think the only way to do it right now is as Administrator creating an AKS job.