CybercentreCanada / assemblyline

AssemblyLine 4: File triage and malware analysis
https://cybercentrecanada.github.io/assemblyline4_docs/
MIT License
248 stars 15 forks source link

AL 4.5.0.27: updater cannot upgrade any service #230

Closed kam193 closed 5 months ago

kam193 commented 5 months ago

Describe the bug After upgrading to 4.5.0.27, AssemblyLine isn't able to upgrade any service. However, the upgrade looks successful if I tried to start a container with the new service image manually. I see a couple of errors from starting containers (about ODM schema), but I'm unable to say if they come from the update container (I guess they are not - I see errors from update and privileged services, even if I try to upgrade a regular service). My suspicious is, that for some reason the updating thread is silently failing before even starting the container.

Have you seen anything like this, or should I try to add some additional logs/try-except in the updater container?

To Reproduce Steps to reproduce the behavior:

  1. Upgrade AssemblyLine to v27
  2. Try o update any service.

This is an example log from the updater, here for Swiffer (I tried with a few different services):

{"@timestamp": "2024-05-28 20:27:47,142", "event": { "module": "assemblyline", "dataset": "assemblyline.service.updater" }, "host": { "ip": "172.16.0.7", "hostname": "feade7f10c47" }, "log": { "level": "INFO", "logger": "assemblyline.service.updater" }, "process": { "pid": "1" }, "message": "[CU] Service Swiffer is being updated to version 4.5.0.stable6..."}
{"@timestamp": "2024-05-28 20:27:47,143", "event": { "module": "assemblyline", "dataset": "assemblyline.service.updater" }, "host": { "ip": "172.16.0.7", "hostname": "feade7f10c47" }, "log": { "level": "INFO", "logger": "assemblyline.service.updater" }, "process": { "pid": "1" }, "message": "[CI] Installing all services marked for install..."}
{"@timestamp": "2024-05-28 20:27:47,146", "event": { "module": "assemblyline", "dataset": "assemblyline.service.updater" }, "host": { "ip": "172.16.0.7", "hostname": "feade7f10c47" }, "log": { "level": "INFO", "logger": "assemblyline.service.updater" }, "process": { "pid": "1" }, "message": "[CI] Done installing services, waiting 60 seconds for next install..."}
{"@timestamp": "2024-05-28 20:28:03,531", "event": { "module": "assemblyline", "dataset": "assemblyline.service.updater" }, "host": { "ip": "172.16.0.7", "hostname": "feade7f10c47" }, "log": { "level": "ERROR", "logger": "assemblyline.service.updater" }, "process": { "pid": "1" }, "message": "[CU] Service Swiffer has failed to update because resulting service key (Swiffer_4.5.0.6) does not exist. Update procedure cancelled..."}

If I tried a following command (for Sigma which also couldn't be updated by AL, & I reused the network created by AL for the Safelist):

docker run --env SERVICE_API_HOST=http://service-server:5003 --env SERVICE_API_KEY=***--network assemblyline_service-net-Safelist --name test_al <registry cache>/cccs/assemblyline-service-sigma:4.5.0.stable7 

It's successful and AL recognize the new version.

Expected behavior Services are upgraded

Screenshots

Environment (please complete the following information if pertinent):

Additional context Here an example from errors I see in logs, but I think they should be gone after service upgrade and are unrelated:

May 28 20:07:01 docker2 270d7f531c45[2288004]: Traceback (most recent call last):
May 28 20:07:01 docker2 270d7f531c45[2288004]:   File "<frozen runpy>", line 198, in _run_module_as_main
May 28 20:07:01 docker2 270d7f531c45[2288004]:   File "<frozen runpy>", line 88, in _run_code
May 28 20:07:01 docker2 270d7f531c45[2288004]:   File "/opt/al_service/badlist/update_server.py", line 19, in <module>
May 28 20:07:01 docker2 270d7f531c45[2288004]:     from assemblyline_v4_service.updater.updater import ServiceUpdater
May 28 20:07:01 docker2 270d7f531c45[2288004]:   File "/var/lib/assemblyline/.local/lib/python3.11/site-packages/assemblyline_v4_service/updater/updater.py", line 27, in <module>
May 28 20:07:01 docker2 270d7f531c45[2288004]:     from assemblyline_v4_service.common.base import SIGNATURES_META_FILENAME
May 28 20:07:01 docker2 270d7f531c45[2288004]:   File "/var/lib/assemblyline/.local/lib/python3.11/site-packages/assemblyline_v4_service/common/base.py", line 20, in <module>
May 28 20:07:01 docker2 270d7f531c45[2288004]:     from assemblyline_v4_service.common.api import PrivilegedServiceAPI, ServiceAPI
May 28 20:07:01 docker2 270d7f531c45[2288004]:   File "/var/lib/assemblyline/.local/lib/python3.11/site-packages/assemblyline_v4_service/common/api.py", line 5, in <module>
May 28 20:07:01 docker2 270d7f531c45[2288004]:     from assemblyline_core.badlist_client import BadlistClient
May 28 20:07:01 docker2 270d7f531c45[2288004]:   File "/var/lib/assemblyline/.local/lib/python3.11/site-packages/assemblyline_core/badlist_client.py", line 7, in <module>
May 28 20:07:01 docker2 270d7f531c45[2288004]:     from assemblyline.datastore.helper import AssemblylineDatastore
May 28 20:07:01 docker2 270d7f531c45[2288004]:   File "/var/lib/assemblyline/.local/lib/python3.11/site-packages/assemblyline/datastore/helper.py", line 29, in <module>
May 28 20:07:01 docker2 270d7f531c45[2288004]:     from assemblyline.odm.models.result import Result
May 28 20:07:01 docker2 270d7f531c45[2288004]:   File "/var/lib/assemblyline/.local/lib/python3.11/site-packages/assemblyline/odm/models/result.py", line 32, in <module>
May 28 20:07:01 docker2 270d7f531c45[2288004]:     constants = forge.get_constants()
May 28 20:07:01 docker2 270d7f531c45[2288004]:                 ^^^^^^^^^^^^^^^^^^^^^
May 28 20:07:01 docker2 270d7f531c45[2288004]:   File "/var/lib/assemblyline/.local/lib/python3.11/site-packages/assemblyline/common/forge.py", line 123, in get_constants
May 28 20:07:01 docker2 270d7f531c45[2288004]:     config = get_config()
May 28 20:07:01 docker2 270d7f531c45[2288004]:              ^^^^^^^^^^^^
May 28 20:07:01 docker2 270d7f531c45[2288004]:   File "/var/lib/assemblyline/.local/lib/python3.11/site-packages/assemblyline/common/forge.py", line 117, in get_config
May 28 20:07:01 docker2 270d7f531c45[2288004]:     config_cache[yml_config] = _get_config(yml_config=yml_config)
May 28 20:07:01 docker2 270d7f531c45[2288004]:                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
May 28 20:07:01 docker2 270d7f531c45[2288004]:   File "/var/lib/assemblyline/.local/lib/python3.11/site-packages/assemblyline/common/forge.py", line 94, in _get_config
May 28 20:07:01 docker2 270d7f531c45[2288004]:     yml_data = yaml.safe_load(env_substitute(yml_fh.read()))
May 28 20:07:01 docker2 270d7f531c45[2288004]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
May 28 20:07:01 docker2 270d7f531c45[2288004]:   File "/var/lib/assemblyline/.local/lib/python3.11/site-packages/yaml/__init__.py", line 125, in safe_load
May 28 20:07:01 docker2 270d7f531c45[2288004]:     return load(stream, SafeLoader)
May 28 20:07:01 docker2 270d7f531c45[2288004]:            ^^^^^^^^^^^^^^^^^^^^^^^^
May 28 20:07:01 docker2 270d7f531c45[2288004]:   File "/var/lib/assemblyline/.local/lib/python3.11/site-packages/yaml/__init__.py", line 81, in load
May 28 20:07:01 docker2 270d7f531c45[2288004]:     return loader.get_single_data()
May 28 20:07:01 docker2 270d7f531c45[2288004]:            ^^^^^^^^^^^^^^^^^^^^^^^^
May 28 20:07:01 docker2 270d7f531c45[2288004]:   File "/var/lib/assemblyline/.local/lib/python3.11/site-packages/yaml/constructor.py", line 51, in get_single_data
May 28 20:07:01 docker2 270d7f531c45[2288004]:     return self.construct_document(node)
May 28 20:07:01 docker2 270d7f531c45[2288004]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
May 28 20:07:01 docker2 270d7f531c45[2288004]:   File "/var/lib/assemblyline/.local/lib/python3.11/site-packages/yaml/constructor.py", line 60, in construct_document
May 28 20:07:01 docker2 270d7f531c45[2288004]:     for dummy in generator:
May 28 20:07:01 docker2 270d7f531c45[2288004]:   File "/var/lib/assemblyline/.local/lib/python3.11/site-packages/yaml/constructor.py", line 413, in construct_yaml_map
May 28 20:07:01 docker2 270d7f531c45[2288004]:     value = self.construct_mapping(node)
May 28 20:07:01 docker2 270d7f531c45[2288004]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
May 28 20:07:01 docker2 270d7f531c45[2288004]:   File "/var/lib/assemblyline/.local/lib/python3.11/site-packages/yaml/constructor.py", line 218, in construct_mapping
May 28 20:07:01 docker2 270d7f531c45[2288004]:     return super().construct_mapping(node, deep=deep)
May 28 20:07:01 docker2 270d7f531c45[2288004]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
May 28 20:07:01 docker2 270d7f531c45[2288004]:   File "/var/lib/assemblyline/.local/lib/python3.11/site-packages/yaml/constructor.py", line 143, in construct_mapping
May 28 20:07:01 docker2 270d7f531c45[2288004]:     value = self.construct_object(value_node, deep=deep)
May 28 20:07:01 docker2 270d7f531c45[2288004]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
May 28 20:07:01 docker2 270d7f531c45[2288004]:   File "/var/lib/assemblyline/.local/lib/python3.11/site-packages/yaml/constructor.py", line 100, in construct_object
May 28 20:07:01 docker2 270d7f531c45[2288004]:     data = constructor(self, node)
May 28 20:07:01 docker2 270d7f531c45[2288004]:            ^^^^^^^^^^^^^^^^^^^^^^^
May 28 20:07:01 docker2 270d7f531c45[2288004]:   File "/var/lib/assemblyline/.local/lib/python3.11/site-packages/yaml/constructor.py", line 427, in construct_undefined
May 28 20:07:01 docker2 270d7f531c45[2288004]:     raise ConstructorError(None, None,
May 28 20:07:01 docker2 270d7f531c45[2288004]: yaml.constructor.ConstructorError: could not determine a constructor for the tag 'tag:yaml.org,2002:python/object:assemblyline.odm.base.TypedMapping'
May 28 20:07:01 docker2 270d7f531c45[2288004]:   in "<unicode string>", line 451, column 15:
May 28 20:07:01 docker2 270d7f531c45[2288004]:           INGEST: !!python/object:assemblyline.odm ... 
May 28 20:07:01 docker2 270d7f531c45[2288004]:                   ^
cccs-rs commented 5 months ago

Hmm... I wonder if it has anything to do with: https://github.com/CybercentreCanada/assemblyline-base/blob/90ba51e38c3e0e3bf01c8b1f4541902a89208ed0/assemblyline/odm/models/config.py#L1903

At least based on the last line of the error and that was the only new thing added to the Config ODM in the new release. Surprised I haven't seen this..

cccs-rs commented 5 months ago

Could it be perhaps something related to your configuration? I just checked our logging stack and found no mention of the error, so I suspect it could be a parsing error for the metadata enforcement configuration that's causing the error to be raised.

I would suspect to see this error in the other core containers as well since they should all call forge.get_config()

kam193 commented 5 months ago

It may be, I have indeed tried to configure it, and for some reasons I didn't try the upgrade without configuring metadata. But the core containers didn't have any such problem, just privileged services and update containers. Anyway, I'll check it and come back, thanks for the tip

kam193 commented 5 months ago

The current state of debugging:

  1. I have removed the submission key from the config at all & re-updated AL, tried to update any service - no success.
  2. As so, I started adding logs to the updated, and I ended up with information from the L131 (https://github.com/CybercentreCanada/assemblyline-core/blame/167dd89cbc2299321b4c3a13abde9a73823abab5/assemblyline_core/updater/run_updater.py#L131) that container excited with an unsuccessful status code.
  3. After getting the ID of the launched container, I managed to connect issues from logs with it, and this indeed the issue from ODM.

Now I'm looking into what exactly causes the failure.

kam193 commented 5 months ago

And we have the winner - what is, indeed, the config for metadata. However, this is not what core services see. They see the original configuration I wrote, but all privileged services, including the container for updating, gets the configuration prepared by the scaler. It looks like the scaler doesn't understand dumping it well - here is what it produces:

 max_temp_data_length: 4096
  metadata:
    archive: {}
    ingest:
      INGEST: !!python/object:assemblyline.odm.base.TypedMapping     <----- Line 451, as in the traceback
        index: false
        sanitizer: !!python/object/apply:re._compile
        - ^[A-Za-z0-9_ -.]*$
        - 32
        store: false
        type: !!python/object:assemblyline.odm.base.Compound
          ai: true
          child_type: !!python/name:assemblyline.odm.models.config.Metadata ''
          copyto: []
          default: null
          default_set: false
          deprecation: null
          description: null
          getter_function: null
          index: false
          multivalued: false
          name: null
          optional: false
          parent_name: null
          setter_function: null
          store: false
    submit: {}

I'm not entirely sure, but it looks like a dump of the TypedMapping object itself, not the data. Setting any configuration or removing the INGEST key doesn't help either. It looks like one of differences between docker compose and kubernetes deployment, so you might not see it.

kam193 commented 5 months ago

It also explains why my manual update worked - I didn't mount the prepared configuration, and things necessary for the service registration are still equal to defaults (yeah, I know they shouldn't, but it's still a hobby setup ;)), so it worked without the real config file.

cccs-rs commented 5 months ago

Hmm... so it would stand to reason that if you were to shell into the Scaler container, then running:

from assemblyline.common import forge
forge.get_config().as_primitives()

Should yield the same error/garbage output?

kam193 commented 5 months ago

Not exactly, but it does not return really primitives. Let's have a look, all in the scaler container of the AL v27:

>>> c = forge.get_config().as_primitives()
>>> c["submission"]
{'default_max_extracted': 500, 'default_max_supplementary': 500, 'dtl': 30, 'emptyresult_dtl': 5, 'max_dtl': 0, 'max_extraction_depth': 6, 'max_file_size': 524288000, 'max_metadata_length': 4096, 'max_temp_data_length': 4096, 'metadata': {'archive': {}, 'submit': {}, 'ingest': {'INGEST': {}}}, 'sha256_sources': [], 'file_sources': [], 'tag_types': {'attribution': ['attribution.actor', 'attribution.campaign', 'attribution.exploit', 'attribution.implant', 'attribution.family', 'attribution.network', 'av.virus_name', 'file.config', 'technique.obfuscation'], 'behavior': ['file.behavior'], 'ioc': ['network.email.address', 'network.static.ip', 'network.static.domain', 'network.static.uri', 'network.dynamic.ip', 'network.dynamic.domain', 'network.dynamic.uri']}, 'verdicts': {'info': 0, 'suspicious': 300, 'highly_suspicious': 700, 'malicious': 1000}}
>>> # ^ Looks okay
>>> d = yaml.dump(c)
>>> "!!python/object:assemblyline.odm.base.TypedMapping" in d
True
>>> d
' ... \n  metadata:\n    archive: {}\n    ingest:\n      INGEST: !!python/object:assemblyline.odm.base.TypedMapping\n        index: false\n        sanitizer: !!python/object/apply:re._compile...'
>>> # But dumped wrongly
>>> type(c)
<class 'dict'>
>>> type(c["submission"])
<class 'dict'>
>>> type(c["submission"]["metadata"])
<class 'dict'>
>>> type(c["submission"]["metadata"]["ingest"])
<class 'dict'>
>>> type(c["submission"]["metadata"]["ingest"]["INGEST"])
<class 'assemblyline.odm.base.TypedMapping'>
>>> # ^ Because it's not a dict
cccs-rs commented 5 months ago

I think I know a solution to fix that 😁

cccs-rs commented 5 months ago

You can try testing with the 4.5.1.dev166 release to make sure there aren't any other issues that 4.5.0.27 introduced for the Docker deployments.

kam193 commented 5 months ago

I confirm with this release updating works and services look like functioning as well, thanks!

BTW I love the re-design, when does it come to the stable branch?

cccs-rs commented 5 months ago

I can include the patch in the next stable, once the PR is approved!

The update to the iconography was included in the 4.5.0.28 release we pushed yesterday 😁

cccs-rs commented 5 months ago

This should be patched in the 4.5.0.29 release.

kam193 commented 5 months ago

Thanks, I've just installed it and can confirm it works :)

kam193 commented 5 months ago

Hey, I think we both didn't test the service properly... I run it with the default configuration, but not with really configured metadata. When I played with metadata, the scaler service wasn't happy again:

{"@timestamp": "2024-06-04 16:53:41,417", "event": { "module": "assemblyline", "dataset": "assemblyline.scaler" }, "host": { "ip": "x.x.x.x", "hostname": "2a55612146dd" }, "log": { "level": "INFO", "logger": "assemblyline.scaler" }, "process": { "pid": "1" }, "message": "Found the service server at: b1d0d55f6dc96d9942b27fc5dfd5060e5e5cfaca41e17b24a58faf5c425c7e84 [assemblyline-service_server-1]"}
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/var/lib/assemblyline/.local/lib/python3.11/site-packages/assemblyline_core/scaler/run_scaler.py", line 6, in <module>
    with ScalerServer() as scaler:
         ^^^^^^^^^^^^^^
  File "/var/lib/assemblyline/.local/lib/python3.11/site-packages/assemblyline_core/scaler/scaler_server.py", line 365, in __init__
    yaml.dump(json.loads(self.config.json()), handle)
                         ^^^^^^^^^^^^^^^^^^
  File "/var/lib/assemblyline/.local/lib/python3.11/site-packages/assemblyline/odm/base.py", line 1403, in json
    return json.dumps(self.as_primitives())
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/encoder.py", line 200, in encode
    chunks = self.iterencode(o, _one_shot=True)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/encoder.py", line 258, in iterencode
    return _iterencode(o, 0)
           ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/encoder.py", line 180, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type Metadata is not JSON serializable

However, it looks to be a problem with the ingest metadata declaration (which I don't need anyway, so it's not a big issue for me), because the following configuration fails:

...
submission:
  metadata:
    submit:
      python.package_name:
        validator_type: text
        required: false
      python.uploader:
        validator_type: text
        required: false
    ingest:
      INGEST:
        python.package_name:
          validator_type: text
          required: false

but when I remove the ingest key entirely, the scaler service is working again. It looks like either I misunderstood the expected config syntax, or the as_primitives does not handle the ingest definition properly