MISP / PyMISP

Python library using the MISP Rest API
Other
429 stars 278 forks source link

Why is this PyMISP.search(controller='attributes') throwing an exception #308

Closed github-germ closed 5 years ago

github-germ commented 5 years ago

Thanks in advance for your help on this...

MISP 2.4.97 + PyMISP 2.4.96

I would like to do a PyMISP search to retrieve all attributes up to a specified date_to matching a list of tags where the match uses AND, i.e. must match all in the tag list.

With this test, I do know there are many matching attributes in my MISP instance. Also, the exception is seen after about 5-6 minutes.

My attempt is below. Can you help me understand:

  1. Why is the exception thrown?
  2. Are matches in the search tags list ANDed?
  3. Is it OK to specify only the search date_to and leave out the date_from? (note: my date_to is a datetime.datetime)
  4. If there's better or faster way for me to retrieve these matching attributes, please educate me.

Thx...

start search(controller="attributes", tags=['admiralty-scale:source-reliability="a"', 'admiralty-scale:information-credibility="1"', 'estimative-language:likelihood-probability="very-likely"'], date_to=2018-09-21 00:21:39.764394)
exception: Unknown error: the response is not in JSON.
Something is broken server-side, please send us everything that follows (careful with the auth key):
Request headers:
{'User-Agent': 'PyMISP 2.4.96 - Python 3.6.4', 'Accept-Encoding': 'gzip, deflate', 'Accept': 'application/json', 'Connection': 'keep-alive', 'Content-Length': '185', 'Authorization': '<REDACTED>', 'content-type': 'application/json'}
Request body:
{"tags": ["admiralty-scale:source-reliability=\"a\"", "admiralty-scale:information-credibility=\"1\"", "estimative-language:likelihood-probability=\"very-likely\""], "to": "2018-09-21"}
Response (if any):
github-germ commented 5 years ago

Here's what I see in the server debug.log when the exception is thrown back to the client:

root@misp-VM:/var/www/MISP/app/tmp/logs tail -f debug.log
==> debug.log <==
2018-12-05 01:11:12 Notice: Notice (8): Undefined index: Tag in [/var/www/MISP/app/Model/Event.php, line 3367]
Trace:
ErrorHandler::handleError() - APP/Lib/cakephp/lib/Cake/Error/ErrorHandler.php, line 230
Event::_edit() - APP/Model/Event.php, line 3367
EventsController::edit() - APP/Controller/EventsController.php, line 1934
ReflectionMethod::invokeArgs() - [internal], line ??
Controller::invokeAction() - APP/Lib/cakephp/lib/Cake/Controller/Controller.php, line 491
Dispatcher::_invoke() - APP/Lib/cakephp/lib/Cake/Routing/Dispatcher.php, line 193
Dispatcher::dispatch() - APP/Lib/cakephp/lib/Cake/Routing/Dispatcher.php, line 167
[main] - APP/webroot/index.php, line 92
github-germ commented 5 years ago
Rafiot commented 5 years ago

Before going any further, can you please update MISP and PyMISP to the latest releases?

github-germ commented 5 years ago

@Rafiot -- Hi... In my controlled corporate environment, updating is not so simple. Do you think there is something in 2.4.98 that will allow this to work? If that's the case, I can make the upgrade request. Thx...

github-germ commented 5 years ago

@Rafiot -- i was able to test my simple tag retrieval code on a VM with 2.4.98 and it doesn't throw that exception. however, the dataset is totally different. i've requested that our Lab host be upgraded to .98 -- and we'll have to schedule that for our production.

let's assume that this is a bug in 2.4.97. with that, can you please let me know the following:

  1. Are matches in the search tags list ANDed? If not, is there a way to retrieve all attributes that have all tags in a specified tag list?
  2. Is it OK to specify only the search date_to and leave out the date_from? (note: my date_to is a datetime.datetime)
  3. If there's better or faster way for me to retrieve these matching attributes, please educate me.

thanks!

iglocska commented 5 years ago

AND is not implemented. The new format still only supports the same logical operators as before:

OR, NOT

Rafiot commented 5 years ago

And one more thing regarding the new format: there is a new WiP interface you an use this way (Python 3.6+ only):

from pymisp import ExpandedPyMISP
misp_url = '<URL)'
misp_key = '<key>'

misp = ExpandedPyMISP(misp_url, misp_key)
complex_query = complex_query = misp.build_complex_query(not_parameters=['osint:source-type="blog-post"'], or_parameters=['tlp:white'])

r = misp.search(tags=complex_query)

Later on, you will also be able to pass not_parameters to build_complex_query.

Regarding date_from / date_to, if you're looking for data starting at some point in the past to now, you need to use the date_from key. Passing date_to only should give you data from anytime in the past until that date. If that's not the case, let us know./

Passing date_to as datetime will work.

github-germ commented 5 years ago

@Rafiot -- thanks for all this good info!

I was able to get my Lab MISP instance upgraded to 2.4.99 this morning, and my test with the data in that instance, I still get the exception. could it be the size of the response triggering an issue on the server side? might there be something to tune, e.g. in php.ini?

$ python3 -c 'import pymisp; print(pymisp.__version__)'
2.4.99
$ 

code snippet:

    77 try:
    78     resultDict = misp.search(
    79         controller = 'attributes',
    80         tags       = tags,
    81         date_to    = date_to)
    82 except Exception as e:
    83     sys.stderr.write(
    84         'ABORT: search failed after %s secs with exception: %s\n'
    85         % (str(datetime.now() - timeStart), str(e)))
    86     sys.stderr.flush()
    87     sys.exit(1)

exception occurs after ~5min 48sec:

ABORT: search failed after 0:05:48.471058 secs with exception: Unknown error: the response is not in JSON.
Something is broken server-side, please send us everything that follows (careful with the auth key):
Request headers:
{'User-Agent': 'PyMISP 2.4.99 - Python 3.6.4', 'Accept-Encoding': 'gzip, deflate', 'Accept': 'application/json', 'Connection': 'keep-alive', 'Content-Length': '185', 'Authorization': '<REDACTED>', 'content-type': 'application/json'}
Request body:
{"tags": ["admiralty-scale:source-reliability=\"a\"", "admiralty-scale:information-credibility=\"1\"", "estimative-language:likelihood-probability=\"very-likely\""], "to": "2018-09-22"}
Response (if any):
github-germ commented 5 years ago

Later on, you will also be able to pass not_parameters to build_complex_query.

Did you mean and_parameters? If so, is there a PR issue for this so I'll know when it's ready for prime time?

github-germ commented 5 years ago

@Rafiot -- for a bit more comparative results, i ran my code only looking for one tag via pymisp.search and that still threw the same exception as above. i then ran an attribute tag search for that one tag in the WebUI, and that returned successfully in less than a minute with about 22k attributes (note: this MISP has about 2.1 million attr).

so why does the PyMISP client script fail while the WebUI succeeds?

github-germ commented 5 years ago

I also reduced the date_to to only go back 7 days vs. 75, and the same exception recurs.

Rafiot commented 5 years ago

Can you please enable debug so I have an idea of what you're sending in practice, because it seems you're testing many things at once and it is very difficult to figure out what is working/not working.

github-germ commented 5 years ago

@Rafiot -- Thanks so much for working on this with me!!

i'm really only need to retrieve attributes that have certain tags within a window of time. Debug doesn't show much more.

DEBUG [api.py:154 - _prepare_request() ] GET - https://localhost/servers/getPyMISPVersion.json
DEBUG [api.py:173 - _prepare_request() ] {'User-Agent': 'PyMISP 2.4.99 - Python 3.6.4', 'Accept-Encoding': 'gzip, deflate', 'Accept': 'application/json', 'Connection': 'keep-alive', 'Authorization': '<REDACTED>', 'content-type': 'application/json'}
DEBUG [api.py:263 - _check_response() ] {
    "version": "2.4.98"
}
INFO [api.py:107 - __init__() ] The version of PyMISP recommended by the MISP instance (2.4.98) is older than the one you're using now (2.4.99). If you have a problem, please upgrade the MISP instance or use an older PyMISP version.
DEBUG [api.py:154 - _prepare_request() ] GET - https://localhost/attributes/describeTypes.json
DEBUG [api.py:173 - _prepare_request() ] {'User-Agent': 'PyMISP 2.4.99 - Python 3.6.4', 'Accept-Encoding': 'gzip, deflate', 'Accept': 'application/json', 'Connection': 'keep-alive', 'Authorization': '<REDACTED>', 'content-type': 'application/json'}
DEBUG [api.py:263 - _check_response() ] {
    "result": {
        "sane_defaults": {
<... REMOVED TO SAVE SPACE ...>
}
start search(controller="attributes", tags=['admiralty-scale:source-reliability="a"'], date_to="2018-11-30 15:01:05.425108") ...
DEBUG [api.py:154 - _prepare_request() ] POST - https://localhost/attributes/restSearch/download
DEBUG [api.py:156 - _prepare_request() ] {"tags": ["admiralty-scale:source-reliability=\"a\""], "to": "2018-11-30"}
DEBUG [api.py:173 - _prepare_request() ] {'User-Agent': 'PyMISP 2.4.99 - Python 3.6.4', 'Accept-Encoding': 'gzip, deflate', 'Accept': 'application/json', 'Connection': 'keep-alive', 'Content-Length': '74', 'Authorization': 'CtmJMWppkG7l481Lwd1sO48XoO1PhGufhOJntWFW', 'content-type': 'application/json'}
ABORT: search failed after 0:07:13.358120 secs with exception: Unknown error: the response is not in JSON.
Something is broken server-side, please send us everything that follows (careful with the auth key):
Request headers:
{'User-Agent': 'PyMISP 2.4.99 - Python 3.6.4', 'Accept-Encoding': 'gzip, deflate', 'Accept': 'application/json', 'Connection': 'keep-alive', 'Content-Length': '74', 'Authorization': 'CtmJMWppkG7l481Lwd1sO48XoO1PhGufhOJntWFW', 'content-type': 'application/json'}
Request body:
{"tags": ["admiralty-scale:source-reliability=\"a\""], "to": "2018-11-30"}
Response (if any):

real    5m14.522s
user    0m0.249s
sys 0m0.032s
github-germ commented 5 years ago

As another debug comparison, I've tried to recreate what my pymisp.search() code is doing in curl and that does return successful data.

curl --header 'Authorization: <REDACTED>' \
--header 'Accept: application/json' \
--header 'Content-Type: application/json' \
--insecure \
--include \
--write-out '\nHTTP-code=%{http_code} Time-total=%{time_total}\n' \
-X GET 'https://localhost/attributes/restSearch/download/null/null/null/null/admiralty-scale;source-reliability="a"/2018-12-06/null/null/null/null/null'

HTTP/1.1 200 OK
Date: Fri, 07 Dec 2018 16:42:43 GMT
Server: Apache/2.4.6 (Red Hat Enterprise Linux) OpenSSL/1.0.1e-fips mod_fcgid/2\
.3.9 mod_wsgi/4.6.2 Python/3.6
X-Powered-By: PHP/7.1.8
X-result-count: 8328
Set-Cookie: CAKEPHP=fp72u3ol8l1r07jd127hpat53h; expires=Sat, 08-Dec-2018 16:42:\
43 GMT; Max-Age=86400; path=/; secure; HttpOnly
Content-Length: 13355240
Content-Type: application/json; charset=UTF-8

{"response": {"Attribute": [{"id":"2680085","event_id":"33328","object_id":"0",\
"object_relation":null,"category":"Network activity","type":"domain","to_ids":t\
...
HTTP-code=200 Time-total=119.982
github-germ commented 5 years ago

OK, I now believe the exception thrown is due to the size of the dataset in the response. My Python code works fine if I limit the response size by requesting date_from to be only the last one day. However, I need to request larger datasets, e.g. get me all attributes with specified tag(s) older than 90 days. This can be a very large list, e.g. 100s of K and perhaps over 1M the first time I run this app (and then will slim down when run daily form that point forward).

So, if this hypothesis is true, I come back to an earlier question I posed in https://github.com/MISP/PyMISP/issues/308#issuecomment-444955587:

might there be something to tune, e.g. in php.ini?

(btw, my bad in https://github.com/MISP/PyMISP/issues/308#issuecomment-445016636 as that didn't really test a smaller dataset. glad i revisited this :-)

github-germ commented 5 years ago

One more piece of evidence: I ran the same curl as above but requesting all attributes with that tag with only a to date of 75, i.e. all attr older than 75 days (2018-09-23). And that also fails, i this case with HTTP 500.

https://localhost/attributes/restSearch/download/null/null/null/null/admiralty-scale;source-reliability="a"/null/2018-09-23/null/null/null/null

Dataset of response too big?

HTTP/1.1 500 Internal Server Error
Date: Fri, 07 Dec 2018 18:22:07 GMT
Server: Apache/2.4.6 (Red Hat Enterprise Linux) OpenSSL/1.0.1e-fips mod_fcgid/2.
3.9 mod_wsgi/4.6.2 Python/3.6
X-Powered-By: PHP/7.1.8
Set-Cookie: CAKEPHP=7f88jvunhc60v8krd8g4ef4dtu; expires=Sat, 08-Dec-2018 18:22:0
7 GMT; Max-Age=86400; path=/; secure; HttpOnly
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8

HTTP-code=500 Time-total=362.225
github-germ commented 5 years ago

I ran the curl iterating over to dates to see with my MISP content where it starts to fail. The last successful response is with to set to 2018-08-31 (which is98 days before today), i.e. retrieving all attributes with the tag that are older than 98 days.. This has about 214k attributes with about 1.1M tags and about 300MB of JSON data.

Let me know what else I can capture and share. Hoping this is not a big one to fix.

Thanks @Rafiot !!

iglocska commented 5 years ago

You can use page and limit to paginate the data. {"limit": 10000, "page": 1} as additional parameters should work fine.

github-germ commented 5 years ago

{"limit": 10000, "page": 1}

@iglocska -- Thanks for the suggestion. Not sure this is available in PyMISP? I guess I could call curl from my python app if not (not ideal).

In parallel, I had this thought yesterday... if indeed size is the root of this issue, then I was considering segmenting my pymisp.search() calls by iterating overdate_from - date_to ranges (vs. only specifying date_to, simulating pagination. Then act on the concat dataset when all retrieved.

iglocska commented 5 years ago

Afaik, but @Rafiot correct me if I'm wrong - the API will just pass whatever parameters you set on to MISP. So just try to set the limit/page.

On Sat, Dec 8, 2018 at 4:01 PM github-germ notifications@github.com wrote:

{"limit": 10000, "page": 1}

@iglocska https://github.com/iglocska -- Thanks for the suggestion. Not sure this is available in PyMISP? I guess I could call curl from my python app if not (not ideal).

In parallel, I had this thought yestrday... if indeed size is the root of this issue, then I was considering segmenting my pymisp.search() calls by iterating overdate_from - date_to ranges (vs. only specifying date_to, simulating pagination. Then act on the concat dataset when all retrieved.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MISP/PyMISP/issues/308#issuecomment-445465471, or mute the thread https://github.com/notifications/unsubscribe-auth/ADf6wPkPSIBY73645he80FoZ1wrnAwVDks5u29RPgaJpZM4ZByA8 .

github-germ commented 5 years ago

Hey @iglocska ... Thanks. I added limit=10000, page=1 to my call to pymisp.search() and that threw an exception: Unused parameter: limit, page

I will try my idea of what I'll call date paging while awaiting more feedback from @Rafiot -- that might just be my solution if it reduces the dataset for each search to succeed and perform reasonably, i.e. the end-to-end time to perform all the search segments.

github-germ commented 5 years ago

OK, implemented my date paging solution. However, still getting an exception with search of a small date range: 5 day span. Here's the usual exception.

  1. How do I dig further to determine root cause and a fix?
  2. Could there be something fishy in the db (we have MariaDB) attributes or attribute_tags rows, or perhaps in events.
search() segment=11, daysSpan=5: 105-100, date_from="2018-08-25 20:27:35.765230", date_to="2018-08-30 20:27:35.765242" ...
...search() segment=11 took 0:01:22.140584 to retrieve 8384 attr

search() segment=12, daysSpan=5: 100-95, date_from="2018-08-30 20:28:57.905951", date_to="2018-09-04 20:28:57.905963" ...

ABORT: search() segment=12 failed after 0:01:22.140584 secs with exception: Unknown error: the response is not in JSON.
Something is broken server-side, please send us everything that follows (careful with the auth key):
Request headers:
{'User-Agent': 'PyMISP 2.4.99 - Python 3.6.4', 'Accept-Encoding': 'gzip, deflate', 'Accept': 'application/json', 'Connection': 'keep-alive', 'Content-Length': '141', 'Authorization': 'CtmJMWppkG7l481Lwd1sO48XoO1PhGufhOJntWFW', 'content-type': 'application/json'}
Request body:
{"tags": ["nist-cyber-framework:function=\"protect\"", "nist-cyber-framework:function=\"detect\""], "from": "2018-08-30", "to": "2018-09-04"}
Response (if any):
iglocska commented 5 years ago

First of all, could you replicate that with the REST API as a first step?

On Sat, Dec 8, 2018 at 10:12 PM github-germ notifications@github.com wrote:

OK, implemented my date paging solution. However, still getting an exception with search of a small date range: 5 day span. Here's the usual exception.

  1. How do I dig further to determine root cause and a fix?
  2. Could there be something fishy in the db (we have MariaDB) attributes or attribute_tags rows, or perhaps in events.

search() segment=11, daysSpan=5: 105-100, date_from="2018-08-25 20:27:35.765230", date_to="2018-08-30 20:27:35.765242" ... ...search() segment=11 took 0:01:22.140584 to retrieve 8384 attr

search() segment=12, daysSpan=5: 100-95, date_from="2018-08-30 20:28:57.905951", date_to="2018-09-04 20:28:57.905963" ...

ABORT: search() segment=12 failed after 0:01:22.140584 secs with exception: Unknown error: the response is not in JSON. Something is broken server-side, please send us everything that follows (careful with the auth key): Request headers: {'User-Agent': 'PyMISP 2.4.99 - Python 3.6.4', 'Accept-Encoding': 'gzip, deflate', 'Accept': 'application/json', 'Connection': 'keep-alive', 'Content-Length': '141', 'Authorization': 'CtmJMWppkG7l481Lwd1sO48XoO1PhGufhOJntWFW', 'content-type': 'application/json'} Request body: {"tags": ["nist-cyber-framework:function=\"protect\"", "nist-cyber-framework:function=\"detect\""], "from": "2018-08-30", "to": "2018-09-04"} Response (if any):

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MISP/PyMISP/issues/308#issuecomment-445490320, or mute the thread https://github.com/notifications/unsubscribe-auth/ADf6wItzv3dQbpHbZ61RemUiHIwxLZrvks5u3CsogaJpZM4ZByA8 .

github-germ commented 5 years ago

OK... I've isolated the day of trouble :-)

ABORT: search() segment=5 failed after 0:05:37.077723 with exception: Unknown error: the response is not in JSON. Something is broken server-side, please send us everything that follows (careful with the auth key): Request headers: {'User-Agent': 'PyMISP 2.4.99 - Python 3.6.4', 'Accept-Encoding': 'gzip, deflate', 'Accept': 'application/json', 'Connection': 'keep-alive', 'Content-Length': '141', 'Authorization': '', 'content-type': 'application/json'} Request body: {"tags": ["nist-cyber-framework:function=\"protect\"", "nist-cyber-framework:function=\"detect\""], "from": "2018-08-31", "to": "2018-09-01"} Response (if any):

- `curl`

$ curl --header 'Authorization: ' --header 'Accept: application/json' --header 'Content-Type: application/json' --insecure --include --write-out '\nHTTP-code=%{http_code} Time-total=%{time_total}\n' -X GET 'https://localhost/attributes/restSearch/download/null/null/null/null/nist-cyber-framework;function="protect"&&nist-cyber-framework;function="detect"/2018-08-31/2018-09-01/null/null/null/null'

HTTP/1.1 500 Internal Server Error Date: Tue, 11 Dec 2018 12:47:10 GMT Server: Apache/2.4.6 (Red Hat Enterprise Linux) OpenSSL/1.0.1e-fips mod_fcgid/2.3.9 mod_wsgi/4.6.2 Python/3.6 X-Powered-By: PHP/7.1.8 Set-Cookie: CAKEPHP=f2h04u9l7qrsbb1a9q265hk27n; expires=Wed, 12-Dec-2018 12:47:10 GMT; Max-Age=86400; path=/; secure; HttpOnly Connection: close Transfer-Encoding: chunked Content-Type: text/html; charset=UTF-8

HTTP-code=500 Time-total=366.539

- how many attributes in db from-to:

$ date --date='08/31/2018 00:00:00' +"%s" 1535673600 $ date --date='09/01/2018 23:59:59' +"%s" 1535846399 $ mysql -p misp MariaDB [misp]> select count() from attributes where timestamp >= 1535673600 and timestamp <= 1535846399; +----------+ | count() | +----------+ | 10534 | +----------+ 1 row in set (1.07 sec)

MariaDB [misp]> MariaDB [misp]> select count() from attributes -> where timestamp >= 1535673600 and timestamp <= 1535846399 -> and -> id in (select attribute_id from attribute_tags -> where tag_id in (select id from tags -> where name like '%="protect"' or name like '%="detect"')); +----------+ | count() | +----------+ | 9296 | +----------+ 1 row in set (1.16 sec)

MariaDB [misp]>

github-germ commented 5 years ago

and btw, using pagination with curl yielded the HTTP 500 also -- same for limit set to 10000 or 1000

curl -d '{"limit": 10000, "page": 1}' --header 'Authorization: <REDACTED>' --header 'Accept: application/json' --header 'Content-Type: application/json' --insecure --include --write-out '\nHTTP-code=%{http_code} Time-total=%{time_total}\n' -X GET 'https://localhost/attributes/restSearch/download/null/null/null/null/nist-cyber-framework;function="protect"&&nist-cyber-framework;function="detect"/2018-08-31/2018-09-01/null/null/null/null'

HTTP/1.1 500 Internal Server Error
Date: Tue, 11 Dec 2018 13:30:56 GMT
Server: Apache/2.4.6 (Red Hat Enterprise Linux) OpenSSL/1.0.1e-fips mod_fcgid/2.3.9 mod_wsgi/4.6.2 Python/3.6
X-Powered-By: PHP/7.1.8
Set-Cookie: CAKEPHP=3bfatb283cjtpafge586rodaab; expires=Wed, 12-Dec-2018 13:30:56 GMT; Max-Age=86400; path=/; secure; HttpOnly
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8

HTTP-code=500 Time-total=417.995
iglocska commented 5 years ago

Could you check your error.log right after the query? Also, that query looks really funky, I'd definitely move all the search parameters into the JSON input instead of using url parameters (those are deprecated)

github-germ commented 5 years ago
  1. I am only including the curl restSearch here to compare those results to my pymisp.search -- My clients are Python, and that's where this issue started, i.e. I need the PyMISP API to succeed. However, I will change

  2. I'll post what occurs in error.log tomorrow morning. debug.log for pymisp.search was included above in https://github.com/MISP/PyMISP/issues/308#issuecomment-444317524 -- wondering if this error in debug.log could be that an event or attribute has no tags, thus no Tag index (I ask as I've seen that occur in the JSON in some of my Python code an I add a test to ensure the Tag key exists before proceeding).

    2018-12-05 01:11:12 Notice: Notice (8): Undefined index: Tag in [/var/www/MISP/app/Model/Event.php, line 3367]
  3. Btw, any idea when ANDing of the tag list query items will be supported?

Thx for all your help.

github-germ commented 5 years ago

@iglocska -- I revised my python script to request from-to intervals starting at 30-day spans until i get to a date range where i need 5-day intervals, and if a pymisp.search() fails for the desired interval, i then reduce the 5-day span into a 1-day from-to span. and finally if one of those 1-day fails, i note the error and skip that interval moving forward the the next intervals. this seems to work (although is mroe code than i would have expected to write :-), except i really do need those attributes from the 1-day failed interval (so that i can age-out (i.e. delete) some tags), and at this point i'm not sure how to get it due this Unknown error exception being thrown. if there's a way other than using from-to, please let me know.

as per your request, i ran this script and captured output showing the from-to date spans being requested by each pymisp.search while i was tailing misp server logs...

search() segment=1, daysSpan=30: 120-90, from="2018-08-15", to="2018-09-14" ... ...search() segment=1 took 0:06:10.693873 and failed: retry with interval=5

search() segment=1 [retry_seg=1 of 5], daysSpan=5: 120-115, from="2018-08-15", to="2018-08-20" ... ...search() segment=1 took 0:01:05.067318 to retrieve 3600 attr

search() segment=1 [retry_seg=2 of 5], daysSpan=5: 115-110, from="2018-08-20", to="2018-08-25" ... ...search() segment=1 took 0:01:09.705145 to retrieve 4919 attr

search() segment=1 [retry_seg=3 of 5], daysSpan=5: 110-105, from="2018-08-25", to="2018-08-30" ... ...search() segment=1 took 0:01:09.937385 to retrieve 8384 attr

search() segment=1 [retry_seg=4 of 5], daysSpan=5: 105-100, from="2018-08-30", to="2018-09-04" ... ...search() segment=1 took 0:06:16.216046 and failed: retry with interval=1

search() segment=1 [retry_seg=1 of 5], daysSpan=1: 105-104, from="2018-08-30", to="2018-08-31" ... ...search() segment=1 took 0:01:09.091412 to retrieve 3814 attr

search() segment=2, daysSpan=1: 104-103, from="2018-08-31", to="2018-09-01" ... ...search() segment=2 took 0:06:11.879766 and failed: retry with interval=1

search() segment=2 [retry_seg=1 of 5], daysSpan=1: 104-103, from="2018-08-31", to="2018-09-01" ...


* **here's the misp log files tail output while my script was running:** note that there is no output in the `error.log`, only the `debug.log` as i indicated earlier in this issue.

tail --lines=0 -f [^r]*log ==> debug.log <==

==> error.log <==

==> exec-errors.log <==

==> mispzmq.error.log <==

==> mispzmq.log <==

==> debug.log <== 2018-12-13 15:31:35 Notice: Notice (8): Undefined index: Tag in [/var/www/MISP/app/Model/Event.php, line 3385] Trace: ErrorHandler::handleError() - APP/Lib/cakephp/lib/Cake/Error/ErrorHandler.php, line 230 Event::_edit() - APP/Model/Event.php, line 3385 EventsController::edit() - APP/Controller/EventsController.php, line 1946 ReflectionMethod::invokeArgs() - [internal], line ?? Controller::invokeAction() - APP/Lib/cakephp/lib/Cake/Controller/Controller.php, line 491 Dispatcher::_invoke() - APP/Lib/cakephp/lib/Cake/Routing/Dispatcher.php, line 193 Dispatcher::dispatch() - APP/Lib/cakephp/lib/Cake/Routing/Dispatcher.php, line 167 [main] - APP/webroot/index.php, line 92

2018-12-13 15:32:01 Notice: Notice (8): Undefined index: Tag in [/var/www/MISP/app/Model/Event.php, line 3385] Trace: ErrorHandler::handleError() - APP/Lib/cakephp/lib/Cake/Error/ErrorHandler.php, line 230 Event::_edit() - APP/Model/Event.php, line 3385 EventsController::edit() - APP/Controller/EventsController.php, line 1946 ReflectionMethod::invokeArgs() - [internal], line ?? Controller::invokeAction() - APP/Lib/cakephp/lib/Cake/Controller/Controller.php, line 491 Dispatcher::_invoke() - APP/Lib/cakephp/lib/Cake/Routing/Dispatcher.php, line 193 Dispatcher::dispatch() - APP/Lib/cakephp/lib/Cake/Routing/Dispatcher.php, line 167 [main] - APP/webroot/index.php, line 92

2018-12-13 15:35:07 Notice: Notice (8): Undefined index: Tag in [/var/www/MISP/app/Model/Event.php, line 3385] Trace: ErrorHandler::handleError() - APP/Lib/cakephp/lib/Cake/Error/ErrorHandler.php, line 230 Event::_edit() - APP/Model/Event.php, line 3385 EventsController::edit() - APP/Controller/EventsController.php, line 1946 ReflectionMethod::invokeArgs() - [internal], line ?? Controller::invokeAction() - APP/Lib/cakephp/lib/Cake/Controller/Controller.php, line 491 Dispatcher::_invoke() - APP/Lib/cakephp/lib/Cake/Routing/Dispatcher.php, line 193 Dispatcher::dispatch() - APP/Lib/cakephp/lib/Cake/Routing/Dispatcher.php, line 167 [main] - APP/webroot/index.php, line 92

^C



!!**THANKS**!!
github-germ commented 5 years ago

i also revised my curl to use json for the request (vs. the deprecated URL method) for the bad from-to interval experienced when my script does the same with pymisp.search.

in this case,

curl --header 'Authorization: <REDACTED>' --header 'Accept: application/json' --header 'Content-Type: application/json' --insecure --include --write-out '\nHTTP-code=%{http_code} Time-total=%{time_total}\n' --request GET --data '{"from": "2018-08-31", "to": "2018-09-01", "tags": ["nist-cyber-framework;function=\"protect\"", "nist-cyber-framework;function=\"detect\""]}' https://localhost/attributes/restSearch/download
HTTP/1.1 500 Internal Server Error
Date: Thu, 13 Dec 2018 16:15:01 GMT
Server: Apache/2.4.6 (Red Hat Enterprise Linux) OpenSSL/1.0.1e-fips mod_fcgid/2.3.9 mod_wsgi/4.6.2 Python/3.6
X-Powered-By: PHP/7.1.8
Set-Cookie: CAKEPHP=vn94p2kmq03iloe13e1600p4nm; expires=Fri, 14-Dec-2018 16:15:01 GMT; Max-Age=86400; path=/; secure; HttpOnly
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8

HTTP-code=500 Time-total=299.589
iglocska commented 5 years ago

Request should be post and the tags don't need to be sanitised ( restore the : )

iglocska commented 5 years ago

Also error.log should have a stack trace

github-germ commented 5 years ago
  1. error.log has nothing -- i ran the tail -f
  2. i will make changes to the curl but once again my real concern is the pymisp.search() exception. what do you see going on in https://github.com/MISP/PyMISP/issues/308#issuecomment-447024023 ?
github-germ commented 5 years ago

Here's the execution of the revised curl request with any and all *.log output:

$ curl \
--header 'Authorization: <REDACTED>' \
--header 'Accept: application/json' \
--header 'Content-Type: application/json' \
--insecure \
--include \
--write-out '\nHTTP-code=%{http_code} Time-total=%{time_total}\n' \
--request POST \
--data '{"from": "2018-08-31", "to": "2018-09-01", "tags":
["nist-cyber-framework:function=\"protect\"",
"nist-cyber-framework:function=\"detect\""]}' \
https://localhost/attributes/restSearch/download
HTTP/1.1 500 Internal Server Error
Date: Fri, 14 Dec 2018 16:47:42 GMT
Server: Apache/2.4.6 (Red Hat Enterprise Linux) OpenSSL/1.0.1e-fips
mod_fcgid/2.3.9 mod_wsgi/4.6.2 Python/3.6
X-Powered-By: PHP/7.1.8
Set-Cookie: CAKEPHP=891fttrpfshndehugdqb332imd; expires=Sat,
15-Dec-2018 16:47:42 GMT; Max-Age=86400; path=/; secure; HttpOnly
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8

HTTP-code=500 Time-total=403.246
$ 
tail --lines=0 -f [^r]*log
==> debug.log <==

==> error.log <==

==> exec-errors.log <==

==> mispzmq.error.log <==

==> mispzmq.log <==

==> debug.log <==
2018-12-14 16:52:37 Notice: Notice (8): Undefined index: Tag in [/var/www/MISP/app/Model/Event.php, line 3385]
Trace:
ErrorHandler::handleError() - APP/Lib/cakephp/lib/Cake/Error/ErrorHandler.php, line 230
Event::_edit() - APP/Model/Event.php, line 3385
EventsController::edit() - APP/Controller/EventsController.php, line 1946
ReflectionMethod::invokeArgs() - [internal], line ??
Controller::invokeAction() - APP/Lib/cakephp/lib/Cake/Controller/Controller.php, line 491
Dispatcher::_invoke() - APP/Lib/cakephp/lib/Cake/Routing/Dispatcher.php, line 193
Dispatcher::dispatch() - APP/Lib/cakephp/lib/Cake/Routing/Dispatcher.php, line 167
[main] - APP/webroot/index.php, line 92

^C
github-germ commented 5 years ago

Hi @iglocska -- let me know what to collect next to push closer to determining root cause. I have a new feature that I am to complete by month end that depends on this. Really appreciate all your time and expertise.

Rafiot commented 5 years ago

I see you're using curl. Try passing limit/page parameters and it will just work.

github-germ commented 5 years ago

I'm not using curl. My app is PyMISP based. curl is just for comparison to collect info within this issue to possibly find root cause (see https://github.com/MISP/PyMISP/issues/308#issuecomment-446816181)

Are you suggesting that PyMISP will not work with large result data sets since it does not support limit/page args; and that I should call curl from my python process in order to retrieve that data I need?

Not sure that will work as I was getting HTTP 500 on previous tests. See https://github.com/MISP/PyMISP/issues/308#issuecomment-446207832 -- that used the deprecated URL path method. You want me to test that again with the request via JSON data to curl?

Is there a plan to support limit/page within PyMISP?

Can you refer me to an issue that talks about support for AND-ing tag lists in a pymisp.search() request?

Thanks!!

Rafiot commented 5 years ago

Right now, limit/page is only available if you use search in ExpandedPyMISP, which is only compatible with Python 3.6+.

I'll modify search in PyMISP so you can do the same, but it won't be in the package you can install with pip before the next release (should be this week).

Rafiot commented 5 years ago

Here is the change: https://github.com/MISP/PyMISP/commit/5de57816ddeac42cfee3780d3e0d68f023ede272

Install pymisp from github and it should work out of the box.

github-germ commented 5 years ago

Thx!! I will play with that. Questions...

  1. What's the use case? I simply loop through pages until I get a page where there is less than the limit?
  2. Before I invest in this, do you know why did my curl with paging throw HTTP 500 in https://github.com/MISP/PyMISP/issues/308#issuecomment-446207832?
Rafiot commented 5 years ago
  1. Yes.
  2. You're doing a mix of URL parameters and POST so POST ignored. As you can see in the response, the query times out.
github-germ commented 5 years ago

Being a newbie here, I didn't know the kosher way to Install pymisp from github from https://github.com/MISP/PyMISP/commit/5de57816ddeac42cfee3780d3e0d68f023ede272 -- So, I manually copied api.py and exceptions.py -- was there anything else?

(I'd like to learn the right way to do that)

Btw, see https://github.com/MISP/PyMISP/commit/5de57816ddeac42cfee3780d3e0d68f023ede272#r31724061

I'm running a test now (takes a long time as we have a fair bit of data in this instance), and will report back by the morning on whether this paginated method will do the trick for me. And if so, do we want to abandon wondering why the segmented date to-from method that was throwing the exception occurs?

THANKS!

github-germ commented 5 years ago

I am paging at limit=10000. each one takes about 30 seconds to complete. Do you have any recommendations on a better limit to use to avoid exceptions while maximizing performance for the overall data set I need to retrieve?

Rafiot commented 5 years ago

So regarding the installation, it is documented here: https://github.com/MISP/PyMISP#install-the-latest-version-from-repo

The recommended limit is totally depending on the system you're using in the backend, how fast it can return data, and how much data you can return. I'm afraid you will have to figure it out yourself by trying what work best for you.

github-germ commented 5 years ago

Re' limit -- I agree with your comment.

Test Results: pages=133, took 2h40m, no errors!!

I can now move forward to ensure the results are valid and implement the tag TTL age-out rules to be applied to those attributes' tags.

Thanks for all your help!!

I am still puzzled as to why the pymisp version I wrote that narrowed the from-to date window at run-time on segments that threw the exception didn't work, i.e. got down to one day that had less than 10k attributes, and still failed; whereas the paging method with a limit=10k never fails.

Since I can move forward with my app, are you comfortable leaving this mystery as is and closing this issue?

github-germ commented 5 years ago

I'm closing this issue as the workaround using pagination with search works. (note: the root cause and the behavior back to the client for the single search attempt was not determined).