EducationalTestingService / Confero

Eye-tracking, Screen and Event Capturing System for Windows. A web application running on a separate PC allows for real time monitoring of the users actions.
GNU General Public License v3.0
9 stars 1 forks source link

Add support for parsing eye sample stream into eye events #35

Closed garyfeng closed 10 years ago

garyfeng commented 10 years ago

Algorithms for parsing fixations and saccades...

Are you aware of this paper on the latest J of Eye Movement Research? They claim it's independent of the sampling rate of the eye-tracker. MATLAB code, though.

http://www.jemr.org/online/7/1/1

EyeMMV toolbox: An eye movement post-analysis tool based on a two-step spatial dispersion threshold for fixation identification

Authors Vassilios Krassanakis National Technical University of Athens Vassiliki Filippakopoulou National Technical University of Athens Byron Nakos National Technical University of Athens

Abstract Eye movement recordings and their analysis constitute an effective way to examine visual perception. There is a special need for the design of computer software for the performance of data analysis. The present study describes the development of a new toolbox, called EyeMMV (Eye Movements Metrics & Visualizations), for post experimental eye movement analysis. The detection of fixation events is performed with the use of an introduced algorithm based on a two-step spatial dispersion threshold. Furthermore, EyeMMV is designed to support all well-known eye tracking metrics and visualization techniques. The results of fixation identification algorithm are compared with those of an algorithm of dispersion-type with a moving window, imported in another open source analysis tool. The comparison produces outputs that are strongly correlated. The EyeMMV software is developed using the scripting language of MATLAB and the source code is distributed through GitHub under the third version of GNU General Public License (link: https://github.com/krasvas/EyeMMV).

isolver commented 10 years ago

I have not seen this one before.

I think if the main goal of the parser is for fixation detection and not as much for accurate saccades, then this type of algorithm is good, as you get less fixation breakup. If the goal of the parser is for accurate saccade events, with the idea being that anything that is not a saccade (or blink / missing data) is a fixation, then this type of algorithm is not the greatest.

It really depends on what events are of most interest for the research; I do not think one approach is 'right' and the other 'wrong'.

garyfeng commented 10 years ago

couldn't agree more. Would it be possible to have different parsers as options or plug-ins, for different purposes? I can imagine for most reading and usability type studies, a fixation-based algorithm can be the default. For gaze-contingent applications and particularly high-speed trackers, we can call in a saccadic parser.

For the ETS work, a fixation parser that works with 60Hz trackers is adequate.

For the SSSR workshop, we can probably use EyeLink's native parser for a gaze-contingent demo.

Does this make sense?

isolver commented 10 years ago

That is right along the lines I was thinking of; having a filtering framework added to iohub that provides a couple initial filters, but makes it possible easy add new filters by providing extensions of a provided base class and implementing a small set of required methods.

I would like, if possible, to make the filtering framework generic, in the sense that it can be used to take in event streams and output filtered versions of the same event type (like eye samples) but also be able to support taking in 1 or more event streams and outputting different event types (samples -> fixations, saccades,etc). The only constraint would be that the output events are of one of the existing iohub event types (so it can automatically be saved to the datastore as well as streamed in realtime).

I am still thinking it through (I started down this route a year ago and end up throwing away a months worth of work because it ended up being to messy). My current thinking is that an ioFilter Could be a new iohub virtual device type. You could therefore use existing ones in your experiment by defining them within the iohub_config file, including the properties for what input event type(s) a given filter takes and what output event type(s) are produced. A filter could therefore take as an input the output of another filter for example.

See how much time we end up having, but this has been on my 'I wish I had time' priority list for a while now. ;)

On Mon, Apr 7, 2014 at 11:26 PM, garyfeng notifications@github.com wrote:

couldn't agree more. Would it be possible to have different parsers as options or plug-ins, for different purposes? I can imagine for most reading and usability type studies, a fixation-based algorithm can be the default. For gaze-contingent applications and particularly high-speed trackers, we can call in a saccadic parser.

For the ETS work, a fixation parser that works with 60Hz trackers is adequate.

For the SSSR workshop, we can probably use EyeLink's native parser for a gaze-contingent demo.

Does this make sense?

— Reply to this email directly or view it on GitHubhttps://github.com/isolver/UserMonitor/issues/35#issuecomment-39808317 .

garyfeng commented 10 years ago

all sounds good. Let's enable at least one filter, and then create the ioFilter device if we have time.

isolver commented 10 years ago

probably a realistic suggestion. ;)

garyfeng commented 10 years ago

are you aware of this paper? http://www.readcube.com/articles/10.3758/s13428-012-0245-6, and their project here: http://www.cbcd.bbk.ac.uk/research/grafix/description-1

I don't know if theirs is any good.

isolver commented 10 years ago

I have the article, maybe you sent it previously? I had not seen the software implementation for the auto / manual parsing process.

Not a big fan of the idea of the manual phase. If it turned out that the auto parser was good enough that the manual step was optional only, it could be nice. If you are OK with the offline only approach, then this could be tried. Only the script to create the eye data input format the app requires would be needed.

Should I write this script so their software can be tried with data collected from confero? If it works well I guess it saves needing to do a parser, which is next on my list so it is a good time to decide how to proceed.

Thanks

Sol

On Saturday, May 17, 2014, garyfeng notifications@github.com wrote:

are you aware of this paper? http://www.readcube.com/articles/10.3758/s13428-012-0245-6, and their project here: http://www.cbcd.bbk.ac.uk/research/grafix/description-1

I don't know if theirs is any good.

— Reply to this email directly or view it on GitHubhttps://github.com/isolver/UserMonitor/issues/35#issuecomment-43416766 .

garyfeng commented 10 years ago

I also want to avoid manual adjustments at all cost. Our system as a whole should work automatically with little or no manual operations (by default). So if we were to use their parser, it would be their auto parser. There are two questions in my mind, in that order:

Q2 is a low-priority consideration, so our decision hinges on Q1, which I can't answer. I would appreciate your professional judgment here. Thinking about interfacing with their system, this would mean being able to export our data to their format, do their black magic (including manual fixes), and export the fixation tags back so that we can combine it in our data (HDF5). As you said, exporting to theirs is probably not a big deal, but back-porting their parsed format to ours may be more trouble. They have a GUI, and I am not sure there is commandline access to their auto algorithm. These complications/risks should also be taken into account.

Thoughts?

On May 17, 2014, at 6:58 PM, Sol Simpson notifications@github.com wrote:

I have the article, maybe you sent it previously? I had not seen the software implementation for the auto / manual parsing process.

Not a big fan of the idea of the manual phase. If it turned out that the auto parser was good enough that the manual step was optional only, it could be nice. If you are OK with the offline only approach, then this could be tried. Only the script to create the eye data input format the app requires would be needed.

Should I write this script so their software can be tried with data collected from confero? If it works well I guess it saves needing to do a parser, which is next on my list so it is a good time to decide how to proceed.

Thanks

Sol

On Saturday, May 17, 2014, garyfeng notifications@github.com wrote:

are you aware of this paper? http://www.readcube.com/articles/10.3758/s13428-012-0245-6, and their project here: http://www.cbcd.bbk.ac.uk/research/grafix/description-1

I don't know if theirs is any good.

— Reply to this email directly or view it on GitHubhttps://github.com/isolver/UserMonitor/issues/35#issuecomment-43416766 .

— Reply to this email directly or view it on GitHub.

isolver commented 10 years ago

I agree with your sentiment overall for sure. Some of the questions you are asking will not really be answerable until after the fact, so there is a degree of unknown here (I can make some BS up if you want, but I don't think you or I do. ;).

Our system as a whole should work automatically with little or no manual

operations (by default).

I feel the same way. Then I start to think that maybe I am being too optimistic because most of my actual eye tracker data processing experience is from the eyelink, on the higher end of the data quality scale I would say.

This raises the question of whether both eyelink and tobii data will be parsed externally, or just the tobii data since there is no other choice. How you feel about this, and what % of the time the tobii will be used vs eyelink, is probably an important consideration in all of this.

Us having our own 'Data Viewer" so to speak, with the ability to adjust parameters and do manual parsing is well out of scope time wise for the current project.

So, if the above is valid, that suggests at least trying to use their tools since if they work as stated they should offer the best of both situations. However a different workflow would be used compared to building the parser into iohub.

Using grafix:

  1. Eye data saved from Confero would be converted to the x, y, pupil_size (2x if binoc) format they need. This could be done automatically of course. The file could be savind in the experiment session folder along will all other data for that session.
  2. Then the experimenter would need to use grafix by loading the data file, going through the parsing process (auto only or auto + manual will depend on the data).
  3. The experimenter will need to save out the parsed eye events to a file.
  4. A script would need to be run to read those events and add them to the hdf5 file
  5. The existing frame -> hdf5 event matching script could be used unchanged really.
  6. I see no command line tool reference, and the code is written in c++ using some c++ linear algebra library. Porting to python is not likely.

Using parser built into iohub:

  1. Events would be parsed online and saved to the hdf5 file. Since real time event access is not important here, the online algorithm can be slow in event detection (delay wise) at the benefit of less false alarms.
  2. Process would be much more automated and streamlined. If the data is good, then win win. :)
  3. A script that uses matplotlib to plot the eye traces and events parsed should be available, changing data from the plot would not be possible. Again, this will be needed for development anyhow.
  4. Parser could be rerun on an hdf5 file using different settings to try and improve parsing performance. This is what would be done initially during development anyhow, so that the best default values for the system being used can be set.
  5. Replacing existing eye events with new post-hoc parsed events into the hdf5 file would be extra work in this situation as well, so both grafix and this option are the same in that respect.
  6. No matter what algorithm is used, "crap in -> crap out" as they say.
  7. A Confero Manual Parser is out of current scope, but I suppose grafix could be used anyhow when necessary.
    • is their (auto) algorithm better than what we plan to do? If they are on

      the par with each other, then we should implement our own. If theirs is much better or has obvious advantages, we may not want to reinvent the wheel and may think about how to interface with theirs.

Wow, talk about a loaded question. ;) This is hard to answer right now for 2 high level reasons: we do not know how well grafix works at all and we do not know how well our custom implemented parser will work right now.

But I was struck by the difference in the fixdur distribution between the

automated and the manual versions, particularly with the low-sampling-rate trackers. And it made me think that we may want to be able to interface with their system.

Exactly. But this is a two sided coin is it not? Manually parsing events introduces all the biases that come along with it. Which is more accurate will likely depend on the person who is doing the manual parsing and the degree of total junk in the data.

No doubt though, that is some cases having the ability to do what grafix does, would be useful.

So here is my thinking on the approach we take:

  1. I think it would be nice to use the same parser for eyelink and tobii data. The custom coded parser and the grafix approach both do this.
  2. We should take a could days and evaluate grafix by actually using it. Writing the script to create a grafix friendly input file is 1 - 2 hours work, so it is worth it I would think.
  3. If it turns out grafix is all talk and no action, then our decision is made for us anyhow.
  4. If grafix is amazing, and you are OK with the type of work flow outlined above , then using grafix is probably the right choice. Not as 'fun' as writing a parser, but a better project decision likely.
  5. If grafixs is just OK, the harder decision needs to be made. However we can probably make this decision by end of week or early next week if you might also have some time to look at what I find and do some actual data recordings of your task with the tobii in particular.

Let me know what you think.

On Sat, May 17, 2014 at 7:25 PM, garyfeng notifications@github.com wrote:

I also want to avoid manual adjustments at all cost. Our system as a whole should work automatically with little or no manual operations (by default). So if we were to use their parser, it would be their auto parser. There are two questions in my mind, in that order:

  • is their (auto) algorithm better than what we plan to do? If they are on the par with each other, then we should implement our own. If theirs is much better or has obvious advantages, we may not want to reinvent the wheel and may think about how to interface with theirs.
  • do we need the option of manual editing? I feel the answer is NO in general, but there may be special cases where this may be helpful. Therefore this is not a must-have, just icing on the cake. But I was struck by the difference in the fixdur distribution between the automated and the manual versions, particularly with the low-sampling-rate trackers. And it made me think that we may want to be able to interface with their system.

Q2 is a low-priority consideration, so our decision hinges on Q1, which I can't answer. I would appreciate your professional judgment here. Thinking about interfacing with their system, this would mean being able to export our data to their format, do their black magic (including manual fixes), and export the fixation tags back so that we can combine it in our data (HDF5). As you said, exporting to theirs is probably not a big deal, but back-porting their parsed format to ours may be more trouble. They have a GUI, and I am not sure there is commandline access to their auto algorithm. These complications/risks should also be taken into account.

Thoughts?

  • gary

On May 17, 2014, at 6:58 PM, Sol Simpson notifications@github.com wrote:

I have the article, maybe you sent it previously? I had not seen the software implementation for the auto / manual parsing process.

Not a big fan of the idea of the manual phase. If it turned out that the auto parser was good enough that the manual step was optional only, it could be nice. If you are OK with the offline only approach, then this could be tried. Only the script to create the eye data input format the app requires would be needed.

Should I write this script so their software can be tried with data collected from confero? If it works well I guess it saves needing to do a parser, which is next on my list so it is a good time to decide how to proceed.

Thanks

Sol

On Saturday, May 17, 2014, garyfeng notifications@github.com wrote:

are you aware of this paper? http://www.readcube.com/articles/10.3758/s13428-012-0245-6, and their project here: http://www.cbcd.bbk.ac.uk/research/grafix/description-1

I don't know if theirs is any good.

— Reply to this email directly or view it on GitHub< https://github.com/isolver/UserMonitor/issues/35#issuecomment-43416766> .

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHubhttps://github.com/isolver/UserMonitor/issues/35#issuecomment-43426967 .

garyfeng commented 10 years ago

don't know if mine is a great mind but we do think alike. A agree completely with your analysis and the final recommendation. Let's

Wrt your question about eyelink vs tobii, for ETS work it will be mostly tobii because (a) it's more portable and (b) we have 2 of them.

isolver commented 10 years ago

Hi Gary,

I have played around with grafix using the 2 test data files that are provided. The way multiple streams of information about the data is visualized is nice, but the tool is extremely limited in terms of really being able to look at the data. Both of the examples seem to rely on breaking up the data into quite short 'segments' of 15 seconds each or so. You can only view one segment at a time, and there is no zoom / pan support, so if a segment was long it would be very hard to actually use the interface at all.

The filtering of samples and auto parsing is applied to all the segments loaded, so that is not bad. The quality of the auto parser is nothing special. I'm pretty sure we can do as good or better on that front.

To do manual adding / merging / deleting of fixations the GUI is somewhat cumbersome and does not allow precise enough manipulations of the fixations IMO.

So I think the tool is very 'so so' right now. It is a very early version so hopefully they will work on improving it. There is potential there for sure, but making it a dependency for this project would not be a good idea right now.

I'm starting to feel like we should just bite the bullet and do the integrated parser. We know we can always fall back on grafix is / when needed. The potential +'s of doing an integrated parser are too big compared to trying to use grafix as a front line tool right now imo.

I am not sure if I ever sent you this before, but here is a pdf that is the 'data processing' section of the ECEM workshop material I wrote (code and text), The pdf contains code and output plots for

The data is Tobii T120 data.

The zip has the full source code.

The parser is incomplete, needing much polishing, but the potential is definitely there based on how far I took it already.

The plots that matplotlib generate are better for 'looking' at data than grafix is right now. You can zoom, pan, stretch the plots in the viewer.

So my current thinking is to continue with the parser development I did last summer and see where it takes us.

FYI, the eye data is currently processed and parsed as follows, see the pdf for more details:

  1. The sample streams are processed for missing data periods and the gaps are filled using linear interpolation between the eye position at the 2 edges of a gap. This is done just for data processing; gaps are still flagged as such in the output data.
  2. Data is converted from pixels to visual angles.
  3. velocity data is calculated
  4. Position and Velocity data are filtered.
  5. The sample data is parsed using a VT parser (so saccades are found, non saccade or missing data regions default to fixations).
  6. The interesting thing about the parser is that the velocity threshold value is calculated based on the amount of noise distribution of the velocity data. A starting VT is picked that is automatically selected that ensures the initial VT is too high. Then the VT is adjusted iteratively, bit by bit, until it settles above the noise level in the data. The algorithm is a modified version of the adaptive algorithm proposed in An adaptive algorithm for fixation, saccade, and glissade detection in eyetracking data, MARCUS NYSTRÖM AND KENNETH HOLMQVIST, Behavior Research Methods (2010)
  7. Using the final VT, data is parsed. You can see how it is doing in the 2 images. This is just the parsed output from the VT algorithm, no heuristics have yet been applied (min saccade /fix durations, acceleration filtering, etc.

Anyhow, let me know what you think. Obviously this will not be done (regardless of what 'is' done) for Wednesday, but it will be a good time to discuss it further.

I also finished the first pass at the warning and error alert system; driven by the config file settings. That was a relatively large piece of functionality that is not for the most part done. Tweaking and bug fixes will come up as usual I'm sure. ;)

I will send you software updates by end of tomorrow so you can have them for when we meet.

Thanks.

Sol

garyfeng commented 10 years ago

Great. This makes our decision easy. Let's proceed as we planned to implement our own parser. Sorry I won't have much time before tomorrow to look at the data you sent. But I have skimmed the Holmqvist chapter before and thought it was very reasonable.

Talk later,

On May 19, 2014, at 11:50 PM, Sol Simpson notifications@github.com wrote:

Hi Gary,

I have played around with grafix using the 2 test data files that are provided. The way multiple streams of information about the data is visualized is nice, but the tool is extremely limited in terms of really being able to look at the data. Both of the examples seem to rely on breaking up the data into quite short 'segments' of 15 seconds each or so. You can only view one segment at a time, and there is no zoom / pan support, so if a segment was long it would be very hard to actually use the interface at all.

The filtering of samples and auto parsing is applied to all the segments loaded, so that is not bad. The quality of the auto parser is nothing special. I'm pretty sure we can do as good or better on that front.

To do manual adding / merging / deleting of fixations the GUI is somewhat cumbersome and does not allow precise enough manipulations of the fixations IMO.

So I think the tool is very 'so so' right now. It is a very early version so hopefully they will work on improving it. There is potential there for sure, but making it a dependency for this project would not be a good idea right now.

I'm starting to feel like we should just bite the bullet and do the integrated parser. We know we can always fall back on grafix is / when needed. The potential +'s of doing an integrated parser are too big compared to trying to use grafix as a front line tool right now imo.

I am not sure if I ever sent you this before, but here is a pdf that is the 'data processing' section of the ECEM workshop material I wrote (code and text), The pdf contains code and output plots for

  • plotting raw data
  • filtering the sample data using several different filter types,
  • calculating velocity and acceleration,
  • and finally the last 2 pages are about the parser and where it was left.

The data is Tobii T120 data.

The zip has the full source code.

The parser is incomplete, needing much polishing, but the potential is definitely there based on how far I took it already.

The plots that matplotlib generate are better for 'looking' at data than grafix is right now. You can zoom, pan, stretch the plots in the viewer.

So my current thinking is to continue with the parser development I did last summer and see where it takes us.

FYI, the eye data is currently processed and parsed as follows, see the pdf for more details:

  1. The sample streams are processed for missing data periods and the gaps are filled using linear interpolation between the eye position at the 2 edges of a gap. This is done just for data processing; gaps are still flagged as such in the output data.
  2. Data is converted from pixels to visual angles.
  3. velocity data is calculated
  4. Position and Velocity data are filtered.
  5. The sample data is parsed using a VT parser (so saccades are found, non saccade or missing data regions default to fixations).
  6. The interesting thing about the parser is that the velocity threshold value is calculated based on the amount of noise distribution of the velocity data. A starting VT is picked that is automatically selected that ensures the initial VT is too high. Then the VT is adjusted iteratively, bit by bit, until it settles above the noise level in the data. The algorithm is a modified version of the adaptive algorithm proposed in An adaptive algorithm for fixation, saccade, and glissade detection in eyetracking data, MARCUS NYSTRÖM AND KENNETH HOLMQVIST, Behavior Research Methods (2010)
  7. Using the final VT, data is parsed. You can see how it is doing in the 2 images. This is just the parsed output from the VT algorithm, no heuristics have yet been applied (min saccade /fix durations, acceleration filtering, etc.

Anyhow, let me know what you think. Obviously this will not be done (regardless of what 'is' done) for Wednesday, but it will be a good time to discuss it further.

I also finished the first pass at the warning and error alert system; driven by the config file settings. That was a relatively large piece of functionality that is not for the most part done. Tweaking and bug fixes will come up as usual I'm sure. ;)

I will send you software updates by end of tomorrow so you can have them for when we meet.

Thanks.

Sol — Reply to this email directly or view it on GitHub.

isolver commented 10 years ago

OK, well I have been banging my head on this for a week now. The long and short of it is online parsing is much harder to get right that offline. ;)

What I have done so far:

  1. Implemented general purpose device event filtering framework.
    • Filter is written as a python class that extends a base filter class.
    • Filter runs on iohub process.
    • Specifies what device it can work with and what device events it wants to get
    • Requested events are passed to filter when they are received by iohub.
    • Filter can output events; must be of an existing iohub device event type. These events are treated just like normal events by iohub (streamed and / or saved), but have a filter_id assigned.
    • When a user asks to get events, they can specify a filter_id for the events to get.
    • Filter is added to iohub from within the users experiment script.
  2. Implemented an eyetracker sample event filter class:
    • Accepts samples
    • keeps a buffer of input samples
    • each time a sample is input, the filter runs a process function, and any new output events are issued.
    • Parser converts sample positions to visual degrees, then calculates velocity info. Filter can be applied to position data and or velocity data. If position data is filtered, that is what velocity calculations are based on. Velocity values from samples that did not have missing eye data (valid samples) are stored in a buffer of a fixed size. Only when velocity buffer become full does parser actually start running.
  3. Parser steps:
    • Each sample is check to see if it has position data or not.
    • If sample is invalid, it is added to a list of invalid samples and parser iteration stops.
    • If sample is valid, checks to see if there are any invalid samples stored. If there are, interpolated position and velocity data file missing sames (for parser use only). The velocity of the current valid sample is then also calculated and added to the velocity buffer.
      • If velocity buffer is full, a velocity threshold is calculated based on the velocity buffer data.
      • Sample is compared to latest velocity threshold and tagged as a fix sample if below thresh, a sacc sample if above thresh, and a miss sample if the sample was invalid.
      • The sample tag is compared between current and last sample. If they are different, an event is created for the run of previous samples that have that same tag.
      • If the event just created has a sample count < a minimum threshold defined for each event type, the event is not issued and is instead merged with the current event type run.
      • If the event has > sample count thresh, the event is considered real and is issued

Issues:

Will work more on it over the weekend; if getting no where, will do offline parser just so something is done for end of contract. Online can be looked at in the fall if you want to continue work on app. Offline filter, using same ideas as the online, but calculating the vel threshold for all data within a video recording period, works very well IMO. So that is doable, just needs to be implements for the specifics of this project and have the necessary script set in place to parse and look at the parsing results. Reparse using different settings if needed.

I feel a bit frustrated right now. ;)

isolver commented 10 years ago

I thought I had updated the status on this, but guess not. So online sample field filtering is now running.

Implemented general purpose device event filtering framework.

Implemented an eyetracker sample event filter class:

Parser steps:

Status

The parser generates events at reasonable eye position transition points, extra events are also generated:

In discussion, it was agreed that this was an OK trade-off for the current online parser. The issues mentioned above could be fixed by running an offline parser post processing script. This feature had not been listed as needed at this time however.

So basically what is needed now is read use testing of parser and input on how to proceed with it if needed. Assigning to Gary for the functionality review and feedback stage.

isolver commented 10 years ago

moving to June 25th final SW delivery date as last date for feedback / change requests to try and fit in now. Anything else will need to be put in as a future work milestone.

isolver commented 10 years ago

move / copy in dev parser script into iohub module