Open pinae opened 7 years ago
Hi Johannes, thanks for your report. I've never encountered such problem so just to make sure:
sacredboard -m name_of_the_db
?Thanks a lot.
Hi, before I created the indices the error occured every time. But I did not test that systematically because I thougt I did something wrong during the installation.
I can reproduce the error every time I change the sorting by clicking on "Experiment name", "Command" or "Hostname". I remember having the error when deactivating some of the statuses on Friday but I could not reproduce that today.
I tested with Firefox 54.0 and Chromium 59.0.3071.109 on Ubuntu 17.04.
There are only Errors for missing files and a Server error on the JavaScript console. Here are some screenshots:
Thanks to your observation, I discovered another minor issue but that was probably not causing your problem. But I was unable to reproduce it.
When you now upgrade to the latest sacredboard version (0.3.1), I think the issue persists.
Nevertheless, when you run sacredboard -m your_db
, the program should produce some output to the console, and I'm pretty sure there is a stack trace describing the cause of the problem.
Could you please copy it for me?
Sorry for the inconvenience.
Might this be related to stdout/stderr logging? for me, this happened when I had only a handful of runs stored, but with long outputs.
I get sth like this:
pymongo.errors.OperationFailure: database error: Runner error: Overflow sort stage buffered data usage of 33836427 bytes exceeds internal limit of 33554432 bytes
Which looks to be related to this
Sorry for my late reply. I get this error on the console:
[2017-08-24 15:49:10,529] ERROR in app: Exception on /api/run [GET]
Traceback (most recent call last):
File "/home/jme/Code/LSTM-Classification-CPU/env/lib/python3.5/site-packages/flask/app.py", line 1982, in wsgi_app
response = self.full_dispatch_request()
File "/home/jme/Code/LSTM-Classification-CPU/env/lib/python3.5/site-packages/flask/app.py", line 1614, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/home/jme/Code/LSTM-Classification-CPU/env/lib/python3.5/site-packages/flask/app.py", line 1517, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/home/jme/Code/LSTM-Classification-CPU/env/lib/python3.5/site-packages/flask/_compat.py", line 33, in reraise
raise value
File "/home/jme/Code/LSTM-Classification-CPU/env/lib/python3.5/site-packages/flask/app.py", line 1612, in full_dispatch_request
rv = self.dispatch_request()
File "/home/jme/Code/LSTM-Classification-CPU/env/lib/python3.5/site-packages/flask/app.py", line 1598, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/home/jme/Code/LSTM-Classification-CPU/env/lib/python3.5/site-packages/sacredboard/app/webapi/routes.py", line 41, in api_runs
return get_runs()
File "/home/jme/Code/LSTM-Classification-CPU/env/lib/python3.5/site-packages/sacredboard/app/webapi/runs.py", line 53, in get_runs
recordsFiltered=records_filtered),
File "/home/jme/Code/LSTM-Classification-CPU/env/lib/python3.5/site-packages/flask/templating.py", line 134, in render_template
context, ctx.app)
File "/home/jme/Code/LSTM-Classification-CPU/env/lib/python3.5/site-packages/flask/templating.py", line 116, in _render
rv = template.render(context)
File "/home/jme/Code/LSTM-Classification-CPU/env/lib/python3.5/site-packages/jinja2/environment.py", line 1008, in render
return self.environment.handle_exception(exc_info, True)
File "/home/jme/Code/LSTM-Classification-CPU/env/lib/python3.5/site-packages/jinja2/environment.py", line 780, in handle_exception
reraise(exc_type, exc_value, tb)
File "/home/jme/Code/LSTM-Classification-CPU/env/lib/python3.5/site-packages/jinja2/_compat.py", line 37, in reraise
raise value.with_traceback(tb)
File "/home/jme/Code/LSTM-Classification-CPU/env/lib/python3.5/site-packages/sacredboard/templates/api/runs.js", line 7, in top-level template code
{%- for run in runs -%}
File "/home/jme/Code/LSTM-Classification-CPU/env/lib/python3.5/site-packages/jinja2/runtime.py", line 410, in __init__
self._after = self._safe_next()
File "/home/jme/Code/LSTM-Classification-CPU/env/lib/python3.5/site-packages/jinja2/runtime.py", line 430, in _safe_next
return next(self._iterator)
File "/home/jme/Code/LSTM-Classification-CPU/env/lib/python3.5/site-packages/pymongo/cursor.py", line 1132, in next
if len(self.__data) or self._refresh():
File "/home/jme/Code/LSTM-Classification-CPU/env/lib/python3.5/site-packages/pymongo/cursor.py", line 1055, in _refresh
self.__collation))
File "/home/jme/Code/LSTM-Classification-CPU/env/lib/python3.5/site-packages/pymongo/cursor.py", line 947, in __send_message
helpers._check_command_response(doc['data'][0])
File "/home/jme/Code/LSTM-Classification-CPU/env/lib/python3.5/site-packages/pymongo/helpers.py", line 210, in _check_command_response
raise OperationFailure(msg % errmsg, code, response)
pymongo.errors.OperationFailure: Executor error during find command: OperationFailed: Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit.
My outputs are also pretty long because my code displays progress bars.
Thanks for posting the output! It really seems to be related to the issue that @black-puppydog posted. This needs further analysis to see how to handle the problem - whether Sacredboard should try to automatically add indices on the columns in the table, or just after the exception is thrown. I'll have a look at it after I finish the feature I have been working on recently (deleting experiments). I'm sorry for inconvenience until then.
@pinae I'm pretty sure it's a CORS error. Check out the datatables.net link they provide and read about it some more for different solution. A quick test to see if this is the case is to check out the web app from a different computer on the same network (swapping 127.0.0.1 for xxx.x.x.xx for whatever your local IP is).
Hey there, I'm frequently stumbling upon this issue as well. I assume that it happens when the stored log output of the experiment is very long (e.g. training a model with Tensorflow for a couple of days).
Any updates on a potential fix?
@schroederen Try to add some indices. I added some and it fixed the problem. I missed to write down what exactly I did and realized after I reported the issue that it would have been beneficial.
@pinae Thanks for the reply. For what key did you create the indices? i.e. which parameters did you use for db.collections.createIndex()?
In the meantime, one can increase the limit of the search buffer, as described here. I Increased it to 50MB (from 30) and this fixes the issues I'm having. However, I expect to run out of buffer again eventually, so the thing with the indices might be a more elegant solution.
I created an index for each column in the board and it fixed the problem for me:
To create an index see https://docs.mongodb.com/manual/indexes/
Did this and it fixed the issue for a while, but it has returned. Additionally, I'm getting other wierd issues now: Some experiments stop to show up and also sorting by ID does not work correctly anymore. Maybe it wasn't such a good idea to create indices for all columns or I didn't do it correctly? :D
@chovanecm Is there any "official" fix incoming? Would be greatly appreciated!
For those that don't know how to create an index (like I didn't), you can use createIndex in the Mongo CLI. So to add a heartbeat index I did the following:
> use sacred
switched to db sacred
> db.runs.createIndex({ "heartbeat": -1 });
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
I am considering letting Sacredboard automatically create indices for the displayed columns. I am just afraid of what happens if I implement #24 (adding custom columns).
I added indices for all possible entries, but I'm still getting this error on some columns. I've managed to get a trace from the console:
[2018-06-08 09:20:09,991] ERROR in app: Exception on /api/run [GET] Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 2292, in wsgi_app response = self.full_dispatch_request() File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1815, in full_dispatch_request rv = self.handle_user_exception(e) File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1718, in handle_user_exception reraise(exc_type, exc_value, tb) File "/usr/local/lib/python3.5/dist-packages/flask/_compat.py", line 35, in reraise raise value File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1813, in full_dispatch_request rv = self.dispatch_request() File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1799, in dispatch_request return self.view_functionsrule.endpoint File "/usr/local/lib/python3.5/dist-packages/sacredboard/app/webapi/runs.py", line 16, in api_runs return get_runs() File "/usr/local/lib/python3.5/dist-packages/sacredboard/app/webapi/runs.py", line 94, in get_runs recordsFiltered=records_filtered), File "/usr/local/lib/python3.5/dist-packages/flask/templating.py", line 135, in render_template context, ctx.app) File "/usr/local/lib/python3.5/dist-packages/flask/templating.py", line 117, in _render rv = template.render(context) File "/usr/local/lib/python3.5/dist-packages/jinja2/environment.py", line 1008, in render return self.environment.handle_exception(exc_info, True) File "/usr/local/lib/python3.5/dist-packages/jinja2/environment.py", line 780, in handle_exception reraise(exc_type, exc_value, tb) File "/usr/local/lib/python3.5/dist-packages/jinja2/_compat.py", line 37, in reraise raise value.with_traceback(tb) File "/usr/local/lib/python3.5/dist-packages/sacredboard/templates/api/runs.js", line 13, in top-level template code "is_alive": {{run.heartbeat | default | timediff | detect_alive_experiment | tojson }}, File "/usr/local/lib/python3.5/dist-packages/sacredboard/app/config/jinja_filters.py", line 28, in timediff diff = now - time TypeError: unsupported operand type(s) for -: 'datetime.datetime' and 'NoneType'
This looks like a different error, but it happens when trying to sort entries by some of the columns.
@anibali I had the same problem. Adding the index you described immediately resolved the problem. So probably letting sacredboard do this automatically is a good idea.
Same here, adding the indices worked like magic! So just to make it easier for copy pasting:
mongo
into the shell)use <databasename>
Hi I am getting this issue when using the FileObserver...
me:~/Projects/sacred_test$ sacredboard -F experiments/tests/
[2019-03-01 18:57:28,825] ERROR in app: Exception on /api/run [GET]
Traceback (most recent call last):
File "/anaconda3/lib/python3.6/site-packages/flask/app.py", line 2292, in wsgi_app
response = self.full_dispatch_request()
File "/anaconda3/lib/python3.6/site-packages/flask/app.py", line 1815, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/anaconda3/lib/python3.6/site-packages/flask/app.py", line 1718, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/anaconda3/lib/python3.6/site-packages/flask/_compat.py", line 35, in reraise
raise value
File "/anaconda3/lib/python3.6/site-packages/flask/app.py", line 1813, in full_dispatch_request
rv = self.dispatch_request()
File "/anaconda3/lib/python3.6/site-packages/flask/app.py", line 1799, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/anaconda3/lib/python3.6/site-packages/sacredboard/app/webapi/runs.py", line 16, in api_runs
return get_runs()
File "/anaconda3/lib/python3.6/site-packages/sacredboard/app/webapi/runs.py", line 94, in get_runs
recordsFiltered=records_filtered),
File "/anaconda3/lib/python3.6/site-packages/flask/templating.py", line 135, in render_template
context, ctx.app)
File "/anaconda3/lib/python3.6/site-packages/flask/templating.py", line 117, in _render
rv = template.render(context)
File "/anaconda3/lib/python3.6/site-packages/jinja2/asyncsupport.py", line 76, in render
return original_render(self, *args, **kwargs)
File "/anaconda3/lib/python3.6/site-packages/jinja2/environment.py", line 1008, in render
return self.environment.handle_exception(exc_info, True)
File "/anaconda3/lib/python3.6/site-packages/jinja2/environment.py", line 780, in handle_exception
reraise(exc_type, exc_value, tb)
File "/anaconda3/lib/python3.6/site-packages/jinja2/_compat.py", line 37, in reraise
raise value.with_traceback(tb)
File "/anaconda3/lib/python3.6/site-packages/sacredboard/templates/api/runs.js", line 7, in top-level template code
{%- for run in runs -%}
File "/anaconda3/lib/python3.6/site-packages/jinja2/runtime.py", line 435, in __init__
self._after = self._safe_next()
File "/anaconda3/lib/python3.6/site-packages/jinja2/runtime.py", line 455, in _safe_next
return next(self._iterator)
File "/anaconda3/lib/python3.6/site-packages/sacredboard/app/data/filestorage/rundao.py", line 41, in run_iterator
yield self.get(id)
File "/anaconda3/lib/python3.6/site-packages/sacredboard/app/data/filestorage/rundao.py", line 60, in get
run = _read_json(_path_to_run(self.directory, run_id))
File "/anaconda3/lib/python3.6/site-packages/sacredboard/app/data/filestorage/rundao.py", line 101, in _read_json
return json.load(f)
File "/anaconda3/lib/python3.6/json/__init__.py", line 299, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "/anaconda3/lib/python3.6/json/__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "/anaconda3/lib/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/anaconda3/lib/python3.6/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
SacredBoard shows:
DataTables warning: table id=runs - Ajax error. For more information about this error, please see http://datatables.net/tn/7
Sacredboard shows this error if I try to edit the filters:
This seems to be a problem with missisng indexes in the mongodb (as far as I know). I originally got this error when starting sacredboard but after I created an index for start and end dates it only shows up when I change the filters.
I'm using the default settings for a mongodb installation on Ubuntu 17.04. Memory usage for sorting without an index seems to be limited to 32MB in this configuration.
If this is no bug in Sacredboard please add some documentation for the correct settings.