PyGaze crashes in OpenSesame during EyeTribe calibration

pashultz commented 8 years ago

Hello,

We are encountering an error on the OpenSesame PyGaze template, during the calibration phase. The first calibration point is presented for about 1500ms, and then the experiment exits with a KeyError exception.

Unexpected error

item-stack: experiment[run].pygaze_init[run]
exception message: 'values'
time: Wed Jun 22 14:08:56 2016
exception type: KeyError

Traceback:
  File "dist\libqtopensesame\misc\process.py", line 140, in run
  File "dist\libopensesame\experiment.py", line 397, in run
  File "dist\libopensesame\item_store.py", line 95, in execute
  File "dist\libopensesame\item_store.py", line 110, in run
  File "dist\libopensesame\sequence.py", line 50, in run
  File "dist\libopensesame\item_store.py", line 110, in run
  File "C:\Program Files (x86)\OpenSesame\plugins\pygaze_init\pygaze_init.py", line 212, in run
  File "C:\Program Files (x86)\OpenSesame\pygaze\_eyetracker\libeyetribe.py", line 208, in calibrate
  File "C:\Program Files (x86)\OpenSesame\pygaze\_eyetracker\pytribe.py", line 1411, in pointend
KeyError: 'values'

We're using these versions:

OpenSesame 3.0.7
Python 2.7.10
Windows 7 Enterprise, Service Pack 1
EyeTribe SDK 0.9.77
PyGaze 0.6.0a10, the most recent Git commit

smathot commented 8 years ago

I don't know what the direct cause of the error is, but it highlights a more fundamental problem: the code doesn't contain any safeguards. It assumes that the response object has a certain structure, and when it doesn't, things crash with an uninformative error message. Not good, not robust!

One possibility would be to wrap all the functions that parse responses in a decorator. This decorator would then catch Exceptions that occur in the function, and deal with these appropriately, either by ignoring the Exception (if it's not crucial) or by reporting (by raising another Exception) what exactly went wrong, that is, what kind of response was expected, and what kind of response was actually received.

esdalmaijer commented 8 years ago

Can't check now (out of office until Monday), but will come back to the bug later.

As for @smathot's comment: What errors do you mean exactly? The thing is, Exceptions are reported when you run PyGaze-based Python scripts directly, whether they occur in the main thread (all of PyGaze) or in a sub-thread (only present in EyeTribe class). Did you mean that all Exceptions are not reported correctly when PyGaze is used through OpenSesame? Or are you referring specifically to the sub-thread errors?

Or did you mean that the Exceptions themselves are too uninformative, like the KeyError above? As an example: It stems from the fact that the response dict for a pointend request did not contain a 'values' key (which can happen if the EyeTribe send something else back, so we should debug what that 'something else' is, and how to catch it). That's pretty informative if you know the underlying code, but the obvious caveat is that most OpenSesame users will not be. But, then again, do you expect them to debug their own issues, or to post them to our fora and ask for us to fix them?

(Re-reading this, I realise it might seem a bit passive aggressive, but the questions are genuine! I'm trying to get a good idea of what kind of problem you're flagging up, and in what direction you're expecting the solution to be in.)

smathot commented 8 years ago

Or did you mean that the Exceptions themselves are too uninformative, like the KeyError above?

Yes, exactly!

The issue is basically whether a function should validate its input or not. The downside is that validation takes time to program (but not that much), and takes time to run. The upside is that validation makes the code easier to debug, also because you can catch invalid input sooner, before it actually causes the program to crash; and also more robust if you can avoid unnecessary crashes, for example if it's safe to ignore invalid input (say that the EyeTribe sometimes sends invalid messages that can be ignored).

I think good code validates its input at least in cases where it comes from an external source, such as the EyeTribe in this case. And one way to do this is using decorators to catch Exceptions, and (to avoid delays) check the input only when an error occurs.

Of course, validation doesn't make the bugs go away, so it's perhaps not a top priority. But it does make the code more beautiful.

esdalmaijer commented 8 years ago

That's a fair point, maybe we should employ this a bit more.

As a side note to your example: I think the error here is likely in the parsing of the specific response, and not due to validating input. I did build in some safety measures to either validate the input, or to prevent wrong input from coming in:

1) The requests made to the tracker actually do check whether their category is matched in the response (pytribe.py, line 585. We should probably also add a check to see whether the request is also matched.

2) At a higher level: the Threads that communicate with the EyeTribe acquire a lock before they make their request, and then release that lock only after a response is obtained and parsed. (Heartbeater in line 226, sample streamer in line 250.)

Anyway; it's still a good idea to improve the validation mentioned in point 1. I just wanted to point out that I didn't program this very irresponsibly :p

TODO following point 2: Acquire and release the communication lock in libeyetribe.EyeTribeTracker's calibration routine (e.g. in line 195).

esdalmaijer commented 8 years ago

Issue should be fixed with this commit. Could you confirm whether downloading the latest source from here solves your issue?

pashultz commented 8 years ago

Oh, thank you! We'll give this a try on Monday and report back. (Sorry, closed the issue by accident.)

smathot commented 7 years ago

I just tried it, and the crash is indeed gone. But another problem appears to occur now: I can finish the calibration, but afterwards I always get the text 'Calibration failed' without the actual calibration results, i.e. I don't get the overview with the dots and the gaze error for each calibration dot. When I then open the EyeTribe UI, it reports that the calibration was actually fine.

This is on Windows 7 with the latest EyeTribe SDK (0.9.77). I tried it within OpenSesame, using my own fork, but up-to-date with all the changes from here.

esdalmaijer commented 7 years ago

Thanks for the feedback! I had the same issue, which comes from incomplete messages being read from the socket. (Unfortunately, the EyeTribe spews out all kinds of responses on every request, including all samples in memory, so we can't know how large the incoming message is going to be.)

I did implement a memory for incomplete messages, though, which solved the issue for me. Could you confirm that you have the latest commit? The memory thing is around line 664.

EDIT: I'm assuming you are using the most recent version, in which case it would be very helpful if you set the connection class' DEBUG property to True, and report on the output. See line 555 in pytribe.py.

smathot commented 7 years ago

I did indeed have that commit in there. Here's the output that I get when setting DEBUG to True. The Exception at the end is, I think, due to me aborting the experiment, and not related to the issue. The beginning of the output is truncated, because the git bash terminal doesn't have an infinite history.

http://pastebin.com/9HyuhnX1

I should mention that I'm using a pretty slow laptop. I think that there may be some overflowing going on in the way that messages are processed. However, I haven't looked at the code in enough detail to be sure. Could it be something along those lines?

The problem is very reliable though: It occurs every time.

esdalmaijer commented 7 years ago

Unfortunately, the issue happened before the truncation. The stuff in there is only frames that were processed in a different Thread, and no calibration results. So I'm going with my initial suspicion: the EyeTribe's response to the calibration request ends up being cut in half, and for some reason isn't joined up once the second half comes in.

To be honest, I think the only way to remedy this, is an overhaul of how we do stuff. We'll need to separate the requests to the EyeTribe and getting responses from the EyeTribe. The framework I'm thinking of: We keep a record of the most recent messages from the EyeTribe (one entry for every type of message). This is updated in a separate Thread (processing Thread), which simply reads all messages from the EyeTribe, processes them one-by-one (potentially storing half-formed messages), then reads again, processes again, etc. Functions send a request to the EyeTribe, clear the relevant record, wait until it is updated by the processing Thread, and return when a new record becomes available.

BACKGROUND

As you can see from your DEBUG output, the tracker just spams the server with all messages it has in memory. This means that if you request, for example, the calibration results, you also get all queued frames (even if you haven't actually asked for frames before, i.e. when you temporarily deactivate the sampling Thread in PyTribe). This means that we can't anticipate how much info will be sent over the connection, which means that we'll receive half-formed data, which means that we can't simply wait for a response from the EyeTribe after we sent it a request. The best solution would be to parse ALL spam from the EyeTribe as it comes in.

As far as I understand it, this kind of spamming behaviour should stop once the EyeTribe is set to pull mode. To do so, uncomment line 48 in pyrtribe.py. Unfortunately, this didn't work for me. (Not sure whether my implementation fails to set the tracker in pull mode, or whether this doesn't prevent the EyeTribe from still spamming us.)

smathot commented 7 years ago

That sucks. But it does work reliably for you then, does it? If so, then it might make sense to push out these patches anyway, as python-pygaze_0.6.0a16 and see whether that resolves the issue for at least some people. And then think about a better long-term solution. What do you think?

esdalmaijer commented 7 years ago

Would you mind giving the latest commit a test run? I implemented the solution I suggested, and it works on my end. (Then again, I didn't have the calibration issue in the first place, so I hope it'll also work on your end!)

pashultz commented 7 years ago

Thank you so much for investigating this. It does work now, and we can get through the template experiment! This is so exciting.

But I'm afraid there's some bad news too: if the calibration is unsuccessful, instead of alerting us or failing, it simply returns the results from the previous calibration attempt. For example, if I calibrate with an accuracy of 0.57, then retry the calibration and step away from the computer (so that it sees no eyes at all and is guaranteed to fail), the calibration will appear to succeed and return an accuracy of 0.57, with all the same points from the previous run.

Do you think this could be related to the new message handling—could PyGaze be missing a message that would alert it to the failed calibration? I see that the EyeTribe API specifies a Boolean attribute (calibresult['result']) that indicates whether the calibration was successful; is PyGaze using that?

Thanks again—I didn't realize how much new code this would entail and am very grateful for your work.

smathot commented 7 years ago

@esdalmaijer Bravo! This time it actually does work. I can reliably calibrate. I can also confirm the issue pointed out by @pashultz: If I mess up a calibration, then I get the results from the previous successful calibration attempt.

esdalmaijer commented 7 years ago

Thanks @pashultz and @smathot, for testing this, and flagging the calibration persistence up. That should be fixed with the latest commit, which makes sure that the existing calibration is cleared before performing a new one.

As before, I've tested this on my end, and after debugging some emerging issues, things work smoothly. Hope that's true on your ends as well!

smathot commented 7 years ago

Excellent--yes, it works! I had the impression that something quirky happened at one point, but I couldn't reproduce it. @pashultz Could you test the following scenarios and report back?

Start from scratch and then:

Do a failed calibration;
Repeat and do a successful calibration;
Repeat and do another failed calibration;
Repeat and do a successful calibration; and
Continue and finish dummy experiment.

And also start from scratch and then:

Do a successful calibration;
Repeat and do failed calibration;
Repeat and do another successful calibration; and
Continue and finish dummy experiment.

That should cover things, right?

@esdalmaijer Assuming that this test is successful, shall I tag this as 0.6.0a16 ans push it out? I would like to push a bugfix release of OpenSesame next week (3.1.1) and it would be great to include this patch as well.

esdalmaijer commented 7 years ago

@smathot Thanks for testing! And yes, would be great if you could include it in the bugfix for OpenSesame; thanks! (Could you also push it to PyPI, please? I'm assuming this is part of your OpenSesame-update-procedure anyway, but just to be sure.)

Very happy that this seems to finally be resolved. Was kinda dreading the implementation of that fancier response handling, but happy it's here now.

pashultz commented 7 years ago

Yes, hooray! We ran through @smathot's test suite, and everything worked as expected. Thank you so much.

smathot commented 7 years ago

I just updated python-pygaze to 0.6.0a16. To upgrade from a terminal, run:

pip install python-pygaze --upgrade

Or from the OpenSesame debug window:

import pip
pip.main(['install', 'python-pygaze', '--upgrade'])

esdalmaijer / PyGaze

PyGaze crashes in OpenSesame during EyeTribe calibration #70

BACKGROUND