MISP / PyMISP

Python library using the MISP Rest API
Other
446 stars 280 forks source link

Search Attributes - Long Execution Time #583

Closed nbyt3 closed 4 years ago

nbyt3 commented 4 years ago

Environment: 2097 Events 45640 Attributes Start Time: 2020-05-21T09:28:22.585154-04:00 End Time: 2020-05-21T09:47:18.436166-04:00 Total: 19 minutes Resources: 32GB, 10CPU PHP: memory_limit: 16GB, execution_time: 1200

I was wondering if it is possible to speed up the restSearch process on events. Its currently taking 19 minutes for the environment listed above. Are there any other configurations I can adjust to speed this up?

Rafiot commented 4 years ago

Yes, you can pass the page and limit parameters and iterate.

nbyt3 commented 4 years ago

@Rafiot Thank you for the quick response! I will try to work that into my process. I looked through the documentation and couldn't find if the page/limit parameters returns the page number and pages left?

nbyt3 commented 4 years ago

If my math if correct: it takes 5 seconds to search and iterate through 10 events a single page. Based on my 2097 events it will still take around 17 minutes.

Rafiot commented 4 years ago

I cannot check right now, but I don think it does give you the number of pages left.

It is a lot of attributes you're fetching, it will definitely take quite come time. Just checking: are you passing pythonify=True if yes, try without it. Otherwise, I don't think you can do much better, besides narrowing down your query.

nbyt3 commented 4 years ago

@Rafiot Thanks again for the response. Im not passing pythoinify=true. Im using the restSearch events functionality to export all events from MISP for 1) a backup and 2) to post process the event and attributes for an internal project. The only other parameter i am passing is 'includedecayscore'

Rafiot commented 4 years ago

So in theory, you should not fetch all the events every time? The first pass will take a while, but you will be fine afterwards (by just getting the updated events)?

nbyt3 commented 4 years ago

That's a really good idea thank you but due to our process we need to grab all events every pass. We are heavily utilizing sightings for our attributes and would miss any new sightings and tags if we did not grab all events everytime. There used to be an "Export" feature that is going to be depreciated in a future release. I could pull a full export in a few seconds compared to the restSearch. Is there a way to still utilize that functionality and get back the decayscoring?

Rafiot commented 4 years ago

Well, the export will take the same amount of time, it just done asynchronously, so if you need to do the export every 10 min, it won't work. And it won't contain the decayscoring anyway.

Have you considered the ZMQ feed? It will give you the updated on everything live.

I'm just trying to find an alternative, because there will be no way for you to scale up with more events/attributes and export everything every time.

I'm also adding @mokaddem in the loop, maybe he has an idea on how you can export a view of the decaying score for all the data regularly.

mokaddem commented 4 years ago

Unfortunately, I don't see another way to export it along with the score. The decay score is computed on the fly for each individual attributes. Just out of curiosity, what time do you gain without asking for the decay score? Maybe performance could be improved for that part..

nbyt3 commented 4 years ago

@mokaddem It looks like there is a delta of about 8 minutes using "includeDecayScore" and not using that parameter. Our use case is pulling the events and attributes from our misp instance to post process the intel into some other formats. Decayed indicators being very important in that process. Also, due to memory constraints is there a method to stream the results of the events restSearch to disk rather than storing the results in memory?

mokaddem commented 4 years ago

8min is a lot. There is surely a way to decrease this overhead. I'll try to have a look at some point... For your question, MISP is actually doing that. If the response is to large to fit in memory, it will build a file on the disk and stream it back to the user once the search complete.

imidoriya commented 4 years ago

I'd second a request for including the a page count total in the return and if possible, an option to return it in reverse order. Often I want to start with the most recent. If I'm doing this by page / limit, I have no idea what the last page is to work back from / reversed(return).

mokaddem commented 4 years ago

@imidoriya Unfortunately MISP does not offer this feature (yet?)

Rafiot commented 4 years ago

(feel free to open an issue on MISP for this one)