I'm requesting a feature for improved backup and restore functionality for user-specific data in OpenCTI.
Current Workaround
Currently, I back up the raw feed using the following environment settings:
CONNECTOR_SEND_TO_QUEUE=true
CONNECTOR_SEND_TO_DIRECTORY=true
CONNECTOR_SEND_TO_DIRECTORY_PATH=/data
CONNECTOR_SEND_TO_DIRECTORY_RETENTION=1
This setup works well for backing up data streams from various connectors, but it does not include user-specific data. Depending on the connector, this setup behaves differently. For example, the Mitre connector brings in the entire dataset each time, while the Malware Bazaar connector directly ingests data into the process, bypassing data forking. Although this method is better than using the backup connector, which often generated millions of 1kb files and was prone to interruption if Redis failed, it still does not address backing up only user-specific data.
Proposed Solution
Backup File Enhancements
User-specific Backups: Provide the ability to back up specific user UUIDs or groups of users by UUIDs. This would allow targeted backup for individual users or teams, making it easier to manage user-specific changes without duplicating the entire dataset.
Bundled Backups: Enhance the backup connector to bundle backups into larger files rather than generating millions of small files. This would simplify data movement, compression, and storage, ultimately reducing the overhead associated with managing many small backup files.
Use Cases
User-specific Backups: I have 10 users manually inputting data and 30 feeds providing data into OpenCTI. Running the backup connector currently generates millions of small JSON files, making data management, movement, and compression cumbersome. A feature to back up only user data based on specific UUIDs or groups would greatly simplify identifying and restoring user-specific data.
Bundled Backups: The current backup method creates millions of 1kb files, which makes it challenging to move, compress, and store data effectively. Bundling these files into larger, more manageable units would significantly reduce the complexity of backup and restoration tasks, especially for offline environments.
Offline System Support: My setup involves using OpenCTI to ingest data into an online system, and then forking that data to an offline system where users perform significant work. I would like to specifically back up the data created by users on this offline system, without duplicating the entire dataset, which I already back up via data forking.
I believe that enhancing the existing backup connector or introducing a new mechanism to facilitate user-specific or connector-specific backups would greatly streamline data management and provide flexibility in mixed online/offline environments.
version: "3"
services:
connector-restore-files:
image: opencti/connector-restore-files:6.4.1
environment:
- OPENCTI_URL=http://localhost # Local OpenCTI URL
- OPENCTI_TOKEN=ChangeMe # Local OpenCTI token
- CONNECTOR_ID=ChangeMe
- CONNECTOR_NAME=RestoreFiles
- CONNECTOR_SCOPE=restore
- CONNECTOR_CONFIDENCE_LEVEL=15 # From 0 (Unknown) to 100 (Fully trusted)
- CONNECTOR_LOG_LEVEL=error
- BACKUP_PROTOCOL=local # Protocol for file copy (only `local` is supported for now).
- BACKUP_PATH=/tmp # Path to be used to copy the data, can be relative or absolute.
- BACKUP_USER_UUID=id1,id2,id3
- BACKUP_GROUP_NAME=allAnalysts
- BACKUP_BUNDLE=true
restart: always
Additional Information
The goal is to avoid backing up redundant data and reduce the overhead associated with managing and moving millions of tiny files. This feature would also help with tracking user activity more effectively and simplifying disaster recovery for user-specific inputs.
If the Feature Request is Approved, Would You Be Willing to Submit a PR?
Use Case
I'm requesting a feature for improved backup and restore functionality for user-specific data in OpenCTI.
Current Workaround
Currently, I back up the raw feed using the following environment settings:
CONNECTOR_SEND_TO_QUEUE=true
CONNECTOR_SEND_TO_DIRECTORY=true
CONNECTOR_SEND_TO_DIRECTORY_PATH=/data
CONNECTOR_SEND_TO_DIRECTORY_RETENTION=1
This setup works well for backing up data streams from various connectors, but it does not include user-specific data. Depending on the connector, this setup behaves differently. For example, the Mitre connector brings in the entire dataset each time, while the Malware Bazaar connector directly ingests data into the process, bypassing data forking. Although this method is better than using the backup connector, which often generated millions of 1kb files and was prone to interruption if Redis failed, it still does not address backing up only user-specific data.
Proposed Solution
Backup File Enhancements
User-specific Backups: Provide the ability to back up specific user UUIDs or groups of users by UUIDs. This would allow targeted backup for individual users or teams, making it easier to manage user-specific changes without duplicating the entire dataset.
Bundled Backups: Enhance the backup connector to bundle backups into larger files rather than generating millions of small files. This would simplify data movement, compression, and storage, ultimately reducing the overhead associated with managing many small backup files.
Use Cases
User-specific Backups: I have 10 users manually inputting data and 30 feeds providing data into OpenCTI. Running the backup connector currently generates millions of small JSON files, making data management, movement, and compression cumbersome. A feature to back up only user data based on specific UUIDs or groups would greatly simplify identifying and restoring user-specific data.
Bundled Backups: The current backup method creates millions of 1kb files, which makes it challenging to move, compress, and store data effectively. Bundling these files into larger, more manageable units would significantly reduce the complexity of backup and restoration tasks, especially for offline environments.
Offline System Support: My setup involves using OpenCTI to ingest data into an online system, and then forking that data to an offline system where users perform significant work. I would like to specifically back up the data created by users on this offline system, without duplicating the entire dataset, which I already back up via data forking.
I believe that enhancing the existing backup connector or introducing a new mechanism to facilitate user-specific or connector-specific backups would greatly streamline data management and provide flexibility in mixed online/offline environments.
Additional Information
The goal is to avoid backing up redundant data and reduce the overhead associated with managing and moving millions of tiny files. This feature would also help with tracking user activity more effectively and simplifying disaster recovery for user-specific inputs.
If the Feature Request is Approved, Would You Be Willing to Submit a PR?
Yes, I would be willing to contribute to a PR.