Open AndreC10002 opened 3 years ago
Sorry, I missed this issue.
PyMISP cannot control how MISP handles the query, but I see that you're using a very old version of MISP, and I wouldn't be surprised it has been solved int he newer versions.
Generally using the page
and limit
will also avoid huge blobs of data to be returned.
I'm using PyMISP's misp.search() to return records from a MISP instance containing 25mi+ events. Filters for misp.search() are imported from a settings.py file like this:
filters = {'published':'true','date':'2021-06-01','tags':['my_tag']}
and it is called like this:
events = misp.search(metadata=true, limit=entries, **filters, pythonify=True)
I get the records I'm looking for, but as the database grows I started to see more and more 'out of memory' errors. Investigating, I've found this query is executed in the database, regardless of the filters applied to misp.search():
SELECT
Event
.id
,Event
.org_id
,Event
.date
,Event
.info
,Event
.user_id
,Event
.uuid
,Event
.published
,Event
.analysis
,Event
.attribute_count
,Event
.orgc_id
,Event
.timestamp
,Event
.distribution
,Event
.sharing_group_id
,Event
.proposal_email_lock
,Event
.locked
,Event
.threat_level_id
,Event
.publish_timestamp
,Event
.sighting_timestamp
,Event
.disable_correlation
,Event
.extends_uuid
FROMmisp
.events
ASEvent
WHEREEvent
.user_id
= (2)That is not scalable as it basically loads the whole 'events' table to memory - as we use user_id = 2 to import everything into MISP.
What I don't understand is why MISP is running that query without any of the filter I've applied via misp_search(). It looks like it applies the filters after retrieving results and putting them in memory.
MISP 2.4.133 Python 7.2.24 PyMISP latest, just updated it
Perhaps I'm not applying the filters correctly? Anybody can reproduce this situation?