MISP / PyMISP

Python library using the MISP Rest API
Other
430 stars 279 forks source link

Reconcile attribute count from pymisp.search and MySQL select #490

Closed github-germ closed 3 years ago

github-germ commented 4 years ago

Interesting difference in attribute count when using pymisp.search while validating with a direct MySQL select. Can you help us understand how to get the two to match, e.g. is there some additional args needed with the search?

(Note: this was observed as we are implementing an Event/Attribute purge process as discussed in https://github.com/MISP/MISP/issues/5300: Best method to "purge" old MISP data)

Thanks!!

Our Test Case

MariaDB [misp]> select count() from attributes where timestamp >= 1262304000 and timestamp <= 1540857600 and deleted=1; +----------+ | count() | +----------+ | 0 | +----------+ 1 row in set (0.00 sec)

MariaDB [misp]> select count() from attributes where timestamp >= 1262304000 and timestamp <= 1540857600 and deleted=0; +----------+ | count() | +----------+ | 49154 | +----------+ 1 row in set (3 min 4.41 sec)

MariaDB [misp]>

github-germ commented 4 years ago

Btw, the same search and select count comparison for events matches fine.

Rafiot commented 4 years ago

Are you 100% certain that the user the query in PyMISP can see all these attributes? If not, the difference makes sense.

Same if you have orphaned attributes in your MISP instance.

Please find out a bunch of attributes that aren't returned by the search query, and check if the same user you're using to do the query can see them on the web interface.

github-germ commented 4 years ago
Rafiot commented 4 years ago
github-germ commented 4 years ago

The user is admin.

DING DING DING... I think the diff are objects. So I will have the PyMISP script also grab search(controller='objects',...). I'll report back.

Rafiot commented 4 years ago

I don't think controller='objects' exists. In order to count the attributes, you can also only select attributes with object_relation empty or null, that will skip the object attributes.

github-germ commented 4 years ago

Yes, I see now...

elif controller == 'objects':
                raise PyMISPNotImplementedYet('Not implemented yet')

OK, I'll play more.

github-germ commented 4 years ago

OK... So if my use case is to purge all ingested indicator things that have not be revised since some date (e.g. 365d), then are there just 3 things: events, attributes, objects? I forgot about objects (we do have them). Any others?

Does this basic plan make sense?

  1. delete_event with old timestamps
  2. delete_attribute with old timestamps
  3. delete_object with old timestamps

The outcome is all events that remain have only attributes and objects that are newer than the purge date. This will be run nightly (the first batch being the largest to process).

I will also keep a JSON file for each archiving what gets deleted each night and age-out the archive, e.g. every 30 days (as a safety).

Rafiot commented 4 years ago

Yes, that should work, except that we do not have a way to search for objects directly. What you will need to do is either to delete a whole event, or delete the attributes and afterwards search for empty objects, and then delete them.

github-germ commented 4 years ago

Let me see if I am following you...

  1. Are you saying that the set of attributes retrieved by search(controller='attributes',...) includes the attributes that are members of objects?
  2. If so, then deleting that attribute set will yield those empty objects.
  3. In that case, how do I then find those empty objects to then delete?
    • You wrote

      afterwards search for empty objects, and then delete them

Rafiot commented 4 years ago
  1. yes
  2. yes
  3. I don't think we have an easy way to do so yet, but in theory, that's something we could trigger on MISP side. @iglocska what do you think?

In your case, it is not blocking for you, as it just mean you have empty objects in your database, but we should have a way to trigger that cleanup at the instance level.

github-germ commented 4 years ago

Does it make sense to close this issue and spawn a new one requesting a way to access objects, at the very least delete empty objects? Thanks.

Rafiot commented 4 years ago

yes, please open an issue on MISP to have a call that removes empty objects. I'm not sure how it is doable, but that's the way to go.

github-germ commented 4 years ago

Closing this issue since we now have #491.

Hmmm, I think you wanted this new one on MISP/MISP, not MISP/PyMISP. Can you move it over?

github-germ commented 4 years ago

There were still non object attributes that pymisp.search were not returning within a timestamp (from, to) range. Turns out they all had distribution=5, i.e. inherit event.

So is it true that search does not return attributes that are inherited? And if so, then if we are looking to purge attributes that are old (e.g. a year), then is there any way to delete those via PyMISP?

Rafiot commented 4 years ago

They should be returned (inherit is the default). What was the distribution of the event?

github-germ commented 4 years ago

They should be returned (inherit is the default). What was the distribution of the event?

Thx.

Rafiot commented 4 years ago

I'm talking of the distribution of the event the attributes are part of. The attributes inherit the distribution of the events they're part of, and that may (or may not) be the problem. It's just a data point I'd like to have.

github-germ commented 4 years ago

the event the attributes are part of

OK, the 27K attributes older than 365 days, which search is not returning, are children of 94 events. Those events all have distribution, sharing_group_id, locked set to 0, the same local org. However, interesting values found in orgc:

iglocska commented 4 years ago

ok, there seems to be a lot of confusion in here.

Role != orgc.

select * from roles where id = 32

This will return the description of role ID 32 (/roles/index via your UI to see what roles are).

Roles control the ACL when it comes to functionalities a user has access to. For example, can a user with a given role attach tags to data? Can they get access to the audit logs? Can they administer users of their own organisation?

orgc is the creator organisation of an event. It is an alias for the organisations (you can view these via /organisations/index in your UI).

if you wanted to see the organisation that corresponds to an event's creator org ID (event.orgc.id), you have to run this query:

select name from organisations where id=32;

github-germ commented 4 years ago

Of course I know the diff. Multitasking error: pretty embarrassing. SORRY that I made you lecture me on the diff when I knew that!! Let's try that again...

Let me know what else you'd like to see please.

iglocska commented 4 years ago

Non local organisations can have events on your instance, that is absolutely no issue and the events shouldn't behave any differently than the other events. If they do, something really funky is going on.

Basically if I create an event on the CIRCL MISP instance and you pull it from there / via our OSINT feed / via a 3rd party, then MISP will receive an event that was originally created by an organisation that your instance doesn't know. Assuming that you are org.id = 1, if you pulled an event created by us in any of the above mentioned ways, MISP would do the following:

  1. Try to save the event, it knows that it is your user executing the pull, so it automatically assigns you as the local handler of the event (org_id = 1).
  2. It notices that CIRCL is an organisation that is unknown to your instance so far, so it creates a new organisation in your organisations table with a new ID (for example 33). Since MISP knows that you did not take any action to create this organisation, you probably don't intend to give access to your instance to this organisation, so to even prevent the possibility of you accidentally using this organisation when creating users, it creates the organisation as a "known remote organisation" (local = 0).
  3. MISP still wants to track who created the event in the first place, so it assigns orgc_id = 33 (orgc stands for creator organisation - horrible naming choice, I must admit).
iglocska commented 4 years ago

As for the difference between the SQL and the restSearch counts: Something that can cause this is orphaned attributes. Those will never be returned by restSearch. Could this be the issue? Did you manually delete stuff from the DB?

github-germ commented 4 years ago

Cool. That makes sense.

Our Use Case is a Purge Process that deletes all events, attributes, and objects that have a timestamp older than a year (for example).

I've identified that after deleting all the candidate aged-out events and attributes, that there are still 27k attributes remaining that were not returned by pymisp.search which appear to fall into two categories:

  1. Empty MISPObjects See https://github.com/MISP/MISP/issues/5380 where I've requested a method to delete those via PyMISP.
  2. These non-local events and their attributes Can they be deleted locally?
github-germ commented 4 years ago

As for the difference between the SQL and the restSearch counts: Something that can cause this is orphaned attributes. Those will never be returned by restSearch. Could this be the issue? Did you manually delete stuff from the DB?

iglocska commented 4 years ago

Empty MISP objects can't be deleted all that easily (if you're after a purge tool that is), so I think I misunderstood the other issue.

non local events can absolutely be deleted locally.

Very curious, will see if I can reproduce it tomorrow, it's time to hit the sack over here soon ;)

github-germ commented 4 years ago

Thanks. Catch you manana... Will be interested to see how you can delete the non-local content when my search cannot find them. Sleep well !!

iglocska commented 4 years ago

OK, I've been playing around with this over the weekend:

If I set a timestamp on attributes/restSearch it will add two separate where clauses:

the timestamp range will be checked against the attribute timestamp as expected but also against the event timestamp

What this means with a simple example:

So indeed this will cause the discrepancy that you're seeing - it's inline with the intent of the timestamp filter's functionality, but it indeed shows that we're lacking a filter here. So what I'll do next:

Add a new filter option (attribute_timestamp) that will purely search based on the attribute timestamp ignoring the event level. We already have this on the event level via event_timestamp anyway.

I've also added a new tool in MISP that will allow you to inspect the queries that MISP is executing so that you can compare it more easily to direct SQL queries - I highly encourage you to use this and report back anything weird you're seeing in there.

The way it works:

Simply pass the sql:1 parameter along with your API request to get the last 200 SQL queries included in the output. Passing sql:2 will omit the actual results and just return the SQL queries.

For example:

POST to /attributes/restSearch BODY:

{
    "returnFormat": "json",
    "timestamp": [1567296000, 1573222784],
    "sql": 2
}
github-germ commented 4 years ago

Yes! This does explain what we are experiencing. Really appreciate you working it through. Let me me know how we can get the new filter option and MISP SQL query tool here to try. Many Thanks!!

iglocska commented 4 years ago

They should both be there now 😉

On Mon, Nov 11, 2019, 16:29 github-germ notifications@github.com wrote:

Yes! This does explain what we are experiencing. Really appreciate you working it through. Let me me know how we can get the new filter option and MISP SQL query tool here to try. Many Thanks!!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MISP/PyMISP/issues/490?email_source=notifications&email_token=AA37VQAYYN4GBL47TFE4X6LQTF26JA5CNFSM4JHJZQA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDXFQZI#issuecomment-552491109, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA37VQDXDBXBHX6UF4SED7DQTF26JANCNFSM4JHJZQAQ .

github-germ commented 4 years ago

We're on 2.4.117. It's there? :-)

Rafiot commented 4 years ago

No, it's not ;)

Afaik, it will be available on 2.4.119 (118 was released last friday).

github-germ commented 4 years ago

We will be upgrading to 2.4.118 in the next week or in our Dev Lab (Production lags behind that). Once .119 is out, we can resume this one. Thanks again!!