Closed SamuelHassine closed 1 year ago
We do have some R&D being tested internally for this but we are far from an open-source release, and not sure if it will be possible at all as some commercial partners are involved in the project.
For reference https://trial.elemendar.com/
Our free to use trial AI engine translates your CTI uploads and Threat Intel from the Web from their human authored content into machine readable and actionable data in STIX 2.0 now incorporating MITRE ATT&CK™.
(Disappointing result after a rapid test)
Hi There, We noticed your comment about our AI for CTI READ application at trial.elemendar.com We are sorry to hear you experienced a disappointing result. Our accuracy is always improving. Please do try more tests/documents and let us know if you have specific comments/errors. We will shortly be releasing an Open CTI connector so all feedback is really important to us.
Thank you Lee - Elemendar Open CTI Project Admin
On this topic, TRAM v1.0.0 has been released a few days ago on https://github.com/center-for-threat-informed-defense/tram
TRAM enables researchers to test and refine Machine Learning (ML) models for identifying ATT&CK techniques in prose-based cyber threat intel reports and allows threat intel analysts to train ML models and validate ML results.
Right now, it's just possible to train a model to recognize ATT&CK techniques. Not sure that entities/relationships extraction is on the roadmap.
Also on this topic: "Open-CyKG: An Open Cyber Threat Intelligence Knowledge Graph" https://www.sciencedirect.com/science/article/pii/S0950705121007863
Open-CyKG:an Open Cyber Threat Intelligence (CTI) Knowledge Graph (KG) framework that is constructed usingan attention-based neural Open Information Extraction (OIE) model to extract valuable cyber threatinformation from unstructured Advanced Persistent Threat (APT) reports. More specifically, we firstidentify relevant entities by developing a neural cybersecurity Named Entity Recognizer (NER) thataids in labeling relation triples generated by the OIE model. Afterwards, the extracted structureddata is canonicalized to build the KG by employing fusion techniques using word embeddings.
Notebook: https://github.com/IS5882/Open-CyKG
Is there a point for developing this connector? Extracting a STIX bundle from a PDF file is a pain in the ass. Wouldn't a feasible alternative be to simply ask the creator of the PDF file to simply ship the STIX bundle as JSON?
@nor3th : Agree, it's 100% a pain in the ass.
As I'm no longer a full-time CTI analyst who had to work with PDF or HTML files, It's no more my problem. :grin:
However, I have a thought for all analysts who have to deal with unstructured documents (txt/pdf/html). :zipper_mouth_face:
And asking authors to provide STIX packages is IMHO a nice dream. The simplest use case is manual data ingestion of public data from blog posts of CTI companies. They provide STIX2 just to paid customers.
Covered by the import-report connector. And will be part of on-going work on full text indexation and NLP in the core platform.
Problem to Solve
When uploading a PDF file corresponding to a report, this connector should be able to extract STIX knowledge from it using NLP.
Current Workaround
None.
Proposed Solution
Create a connector using NLP.
Additional Information
None.