[RST Cloud - Report Hub] Several issues

Description

Many issues have been identified. As is, the connector can be deployed in production but its quality is very low.

Blocking problem

Some relationships are poorly modeled
- “related to” relationships between countries/sector and Intrusion sets/malware, instead of “targets” relationships
- “related to” relationships between Attack Patterns and Intrusion sets, instead of “uses” relationships

Screenshot 2024-10-07 165337

Non-blocking problem

Do not create Observables associated with Indicators

Screenshot 2024-10-07 164537

An error "'NoneType' object is not subscriptable" is raised at each work:

External references are attached to all entities, but no reports are created. What would be best would be:
1. Create a report
2. Attach the External Reference to the report
3. Create all entities and relationships (as currently done)
4. Put all entities in the report.

Other improvements

It does not put markings on Notes and Organization it creates:

Screenshot 2024-10-07 163944

It does not fill in the "Author" fields of the Attack Patterns and Notes it creates:

Screenshot 2024-10-07 163218

Environment

OCTI 6.3.5

Reproducible Steps

Steps to create the smallest reproducible scenario:

Deploy the connector

Hello @k1r10n : Could you take a look at @Lhorus6 requests for improvements?

Hi @Lhorus6! thanks for your comments!

Based on these comments and previous experience with clients, I believe you may still have the AlienVault connector enabled. When both connectors are enabled, and data from OTX arrives after reports are imported from the Report Hub (some reports we provide are also covered by OTX), this leads to a merge in OpenCTI, which creates the extra 'related-to' entries and makes everything look weird.

Answering to all comments. Feel free to reach out support@rstcloud.net

1) >“related to” relationships between countries/sector and Intrusion sets/malware, instead of “targets” relationships

We provide 'targets' relationships between intrusion sets/malware and countries/sector where this can be extracted from the context by the engine. Sometimes countries are mentioned in the context that a particular IP is located in 'Country A'. This does not mean that the intrusion set is 'related-to', 'originated-from', or 'targets' this country. All of these 3 relationships are supported by the engine but not always are set.

2) >“related to” relationships between Attack Patterns and Intrusion sets, instead of “uses” relationships

When possible the engine adds 'uses' between the 'intrusion-sets' and 'attack-patterns'. This is a standard feature. Attaching an example. 20241013_ctfiot_com_report_0x9e6be797.json

3) >Do not create Observables associated with Indicators

If a TI report contains indicators of compromise, we create indicators based on them. If the report includes 'noisy' indicators (such as an indicator for putty or other tool that can be dropped by an attacker) or, for example, well-known domains/URLs used for geo-location (when, let's say, a stealer checks if it is allowed to operate in a certain country), we create these objects as Observables. So far this logic was ok for the clients and we did not have a request to duplicate each indicator with a corresponding observable.

4) >An error "'NoneType' object is not subscriptable" is raised at each work:

Please provide more information on which particular object is not being created, including the error description. This is not normal. Should be 0 errors.

5) >External references are attached to all entities, but no reports are created. What would be best would be:

Create a report Attach the External Reference to the report Create all entities and relationships (as currently done) Put all entities in the report.

This is exactly how it is done. Each human-readable report is transformed into a STIX bundle, which includes a Report object with notes and all other associated objects. The only thing is that we keep referencing all objects as then you can see in what reports you can find information on what object. It’s possible that, in step 4, the actual Report object is not created.

6) >It does not put markings on Notes and Organization it creates:

We have added markings as per your request. Please re-download.

7) >It does not fill in the "Author" fields of the Attack Patterns and Notes it creates:

Thanks. We’ve added 'Author' to our unique patterns and Notes. For the T* patterns, clients usually sync them from MITRE anyway, and our values are only used for mapping with the MITRE definitions.

Hi @k1r10n, Thanks a lot for your detailed reply.

This leads to a merge in OpenCTI, which creates the extra 'related-to' entries and makes everything look weird

I filtered on “Creator”, which means that all objects (entities and relationships), whether or not they were merged afterwards, have been created by your connector. The merge does not create extra object, it simply deduplicates.

Sometimes countries are mentioned in the context that a particular IP is located in 'Country A'. This does not mean that the intrusion set is 'related-to', 'originated-from', or 'targets' this country.

100% agree. If you're already doing 'originated-from', or 'targets' relationships when you have the information, that's perfect. It only requires to stop creating 'related-to' relationships in this case.

When possible the engine adds 'uses' between the 'intrusion-sets' and 'attack-patterns'.

That's great, in this case I'll give you the same answer as before. Just avoid making any more 'related to' relationships.

So far this logic was ok for the clients and we did not have a request to duplicate each indicator with a corresponding observable.

A good practice is to always create an Observable when you create an Indicator (the reverse is not true). This comes from the fact that functionalities are linked to Observables. For example, many enrichment connectors run on Observables, and not on Indicators. Having only Indicators can limit users.

It's not a duplication. The two entities are closely related, but they represent different things and meet different needs.

Please provide more information on which particular object is not being created, including the error description.

Indeed, sorry, I should have provided it in the first place. On rechecking, it appears that I no longer have any errors. When I look in the logs of my container at the time I created this issue, I find this log (perhaps unrelated) for 4 days in a row:

Failed to download and save entry 20241009_rt-solar_ru_report_0xde561eb0 as PDF. 404 Client Error: Not Found for url: https://api.rstcloud.net/v1/reports?id=20241009_rt-solar_ru_report_0xde561eb0&format=pdf

I think we can consider this point resolved ✅

This is exactly how it is done.

You're right, I now have reports with all the information. Indeed, it's possible that it's linked to point 4.

Each object is linked to a report, so the External Reference may only be applied to the report. Both ways are acceptable.

Screenshot 2024-10-19 152506

Solved too ✅

We have added markings as per your request.

Thank you!

Solved ✅

We’ve added 'Author' to our unique patterns and Notes

Thank you!

Solved ✅

TLDR

All that remains are points 1, 2 and 3. Points 1 and 2 are blocking because they create relationships that create "noise" in the database and can cause misunderstanding for users and potentially distort dashboard values.

@Lhorus6 Regarding points 1 and 2, I believe I need to elaborate more. The 'related-to' relationship is not just noise; it serves a purpose. It just so happens that many integrations overuse 'related-to' everywhere.

Which relationship would you choose for a text like 'Malware A avoids infecting Russia, Belarus, and other CIS countries'? This might lead you to conclude that the authors of the malware are possibly from the CIS, but in doing so, you lose the connection to the named countries and the fact that the malware does not attack these particular entities in this region. I'm sure you can think of several other cases where neither 'originated-from' nor 'targets' would be appropriate. Also, 'targets' and 'originated-from' are basic types. We will further expand our engine capabilities to automatically extract relationships like 'authored-by', 'variant-of', 'downloads', 'drops', etc., but for now, we have recorded them as 'related-to'. Additionally, the STIX 2.1 taxonomy cannot fully cover all types of relationships, so 'related-to' serves as a fallback in some cases. We could create custom types (and the standard allows for that), but then we would face interoperability issues, as different Threat Intel Platforms would need to interpret these new 'custom' relationships. It would be nice to get your view on this.

Our RST Report Hub engine isn't perfect, as it is a machine that parses near all public threat intelligence articles and reports, and there are fluctuations. However, if the same task was assigned to 5 analysts, I doubt they would avoid making any mistakes or complete the task at the same speed (if they could even complete it at all, given there are more than 40,000 articles on the topic annually). Additionally, the cost of having the machine pre-parse the data is about 30 times cheaper than hiring 5 people. So, we currently aim to handle the heavy lifting and leave really detailed refinements to the user.

Regarding point 3, I think we can add a configuration option for the connector: create_observables: True|False in the next release. It hasn't been a priority for our clients so far, but it’s cheap to add, and I believe there are cases where people may want it.

OpenCTI-Platform / connectors