OpenCTI-Platform / opencti

Open Cyber Threat Intelligence Platform
https://opencti.io
Other
6.36k stars 939 forks source link

InternalFiles can be duplicated after index rollover #8235

Closed labo-flg closed 1 month ago

labo-flg commented 2 months ago

Description

When the index rollover happens (1 year or 75M objects), we end up with more than 1 index. It appears that once rollover has been made a file can be reuploaded and created in the second index, ending up in a duplicate.

Additional information

Upon importing a file and creating an InternaFile object in database (file-storage.js upload()), we check that the file already exists only if the option errorOnExisting is set.

Through UI the option is set correctly to true, but not in python-client apparently.

If you access the file that is duplicated, you get an error:

{
  "category": "APP",
  "errors": [
    {
      "attributes": {
        "genre": "TECHNICAL",
        "hits": 2,
        "http_status": 500,
        "id": "import/pending/test.txt.json"
      },
      "message": "Id loading expect only one response",
      "name": "DATABASE_ERROR",
      "stack": "GraphQLError: Id loading expect only one response\n    at error (/opt/opencti/build/src/config/errors.js:7:10)\n    at DatabaseError (/opt/opencti/build/src/config/errors.js:57:48)\n    at elLoadById (/opt/opencti/build/src/database/engine.js:1416:11)\n    at processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at internalLoadById (/opt/opencti/build/src/database/middleware-loader.ts:583:10)\n    at storeLoadById (/opt/opencti/build/src/database/middleware-loader.ts:590:16)\n    at loadFile (/opt/opencti/build/src/database/file-storage.js:258:33)\n    at deleteFile (/opt/opencti/build/src/database/file-storage.js:116:14)\n    at deleteElement (/opt/opencti/build/src/manager/retentionManager.ts:31:5)\n    at executeProcessing (/opt/opencti/build/src/manager/retentionManager.ts:83:7)\n    at Object.retentionHandler [as handler] (/opt/opencti/build/src/manager/retentionManager.ts:107:7)\n    at cronHandler (/opt/opencti/build/src/manager/managerModule.ts:71:11)\n    at /opt/opencti/build/src/manager/managerModule.ts:129:11\n    at Ilt.#runHandlerAndScheduleTimeout (/opt/opencti/build/node_modules/set-interval-async/dist/set-interval-async-timer.cjs:36:13)\n    at Timeout._onTimeout (/opt/opencti/build/node_modules/set-interval-async/dist/set-interval-async-timer.cjs:29:13)"
    }
  ],
  "id": "import/pending/test.txt.json",
  "level": "error",
  "manager": "RETENTION_MANAGER",
  "message": "Id loading expect only one response",
  "source": "backend",
  "timestamp": "2024-08-30T15:37:08.005Z",
  "version": "6.2.17"
}

Looking at the elastic DB, we find both versions of the same file, with same _id

{
  "took": 19,
  "timed_out": false,
  "_shards": {
    "total": 24,
    "successful": 24,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 21.41602,
    "hits": [
      {
        "_index": "opencti_internal_objects-000001",
        "_id": "import/pending/test.txt.json",
        "_score": 21.41602,
        "_source": {
          "name": "test.txt.json",
          "size": 900,
          "information": "",
          "lastModified": "2024-07-23T15:08:27.453Z",
          "lastModifiedSinceMin": 0,
          "metaData": {
            "version": "2024-07-23T15:08:27.378Z",
            "filename": "test.txt.json",
            "mimetype": "application/json",
            "encoding": "7bit",
            "creator_id": "e2fde643-6da6-44e6-83fc-78acdef65eb9",
            "messages": [],
            "errors": [],
            "file_markings": []
          },
          "uploadStatus": "complete",
          "internal_id": "import/pending/test.txt.json",
          "standard_id": "56f9312e-42cb-5dfd-be37-17880000e0b5",
          "entity_type": "InternalFile",
          "rel_object-marking.internal_id": []
        }
      },
      {
        "_index": "opencti_internal_objects-000002",
        "_id": "import/pending/test.txt.json",
        "_score": 13.052662,
        "_source": {
          "name": "test.txt.json",
          "size": 48592,
          "information": "",
          "lastModified": "2024-08-22T09:27:23.001Z",
          "lastModifiedSinceMin": 0,
          "metaData": {
            "version": "2024-08-22T09:27:22.815Z",
            "filename": "test.txt.json",
            "mimetype": "application/json",
            "encoding": "7bit",
            "creator_id": "e2fde643-6da6-44e6-83fc-78acdef65eb9",
            "entity_id": "c7d6b082-2f4a-4052-898d-abbd0fba1a53",
            "messages": [],
            "errors": [],
            "file_markings": []
          },
          "uploadStatus": "complete",
          "internal_id": "import/pending/test.txt.json",
          "standard_id": "56f9312e-42cb-5dfd-be37-17880000e0b5",
          "entity_type": "InternalFile",
          "rel_object-marking.internal_id": []
        }
      }
    ]
  }
}
labo-flg commented 2 months ago

We need to :

  1. solve the root issue and prevent duplicates
  2. find a way to cleanup the platform database