Open darraghdog opened 4 years ago
Hi there Darragh,
I looked at this a few years ago, and at the time in our dataset there was a ~per video offset required to align the interaction events with the video. It might be that the synchronization at log time can be improved; I didn't look at this back when I wrote the webgazerExtractServer.py because the data had already been collected.
From the docs: https://webgazer.cs.brown.edu/data/ "2. Watch a replay. As it processes, the system can show the interaction events against the screen recording. Note that only laptop participants have screen recording synchronization data, and only then are they roughly aligned. Use the text box to 'try some numbers' and find the sync offset. This varies per participant."
In '20180317_participant_characteristics.csv', there are two columns J & K which specify this alignment, labeled "Screen Recording Start Time (Unix milliseconds)" and "Screen Recording Start Time (Wall Clock UTC)". The alignment in the script is taken from there when the application is run, and the offset is determined from that time. Entering more accurate alignments into the .csv will remove the need for the offset.
Actually finding those offsets is probably manual work : / Obviously, we'd be happy to add them to the dataset if you went down that path!
Best wishes, James
Thank you James, this is very helpful.
I did not use the participant_characteristics.csv
file so far. Note in the version I have from a recend download this file does not have a datestamp, but I guess it is the same file.
In my version, I only have the column J&K for 25 of the participants, it is blank for the others.
Nevertheless, I will redo my analysis, and try to enter a start epoch for every participant (even if manual work is needed here).
I will let you know how it works out...
Darragh.
participant_characteristics.csv
: Yes, it is the same file - sorry about that confusion.
The 25 participants should be the laptop participants; we were missing even coarse alignment data for the others.
Good luck! Happy to hear how you get on : )
Hi James,
I have worked on a rough way to get the alignment of the videos. I am working with another solution to do gaze prediction (MPIIGaze). I extracted the MPIIGaze facegaze yaw prediction for all the frames - and then looked at correlation between that value and the Tobii XGaze prediction for each frame (rolling mean averaged to 16FPS).
So I mapped the Recording Start
epoch of the video (form the .json
files) to the epoch in the Tobii tracker.
In this link you have a files of offsets (some of them below) where frameshift
is the offset time (or the gap between the .webm
file start, and the json
recording start
time). The corrOptimal
is using this offset, what is the correlation between MPIIGaze
yaw
values at 16FPS vs. Tobii tracker X direction prediction at 16FPS. The corrOrig
is the correlation we got before applying the offset (simply matching Recording Start
in the json to .webm
file start).
Also, I excluded videos where the time shift was >= 1 second; or where the correlation achieved was under 0.8 - it seems it was the minority. Also, I did not include instruction videos - I plan to rework and can include them if needed.
If you have questions, let me know - maybe it helps. Feel free to close the issue, and thanks for all your work.
Best, Darragh.
(base) dhanley@Darraghs-MacBook-Pro WebGazer % head webGazerAlignment.csv
video,frameshift,corrOptimal,corrOrig
P_01/1491423217564_10_-study-benefits_of_running.webm,-0.125,0.9490191845081442,0.37318958902769983
P_01/1491423217564_17_-study-benefits_of_running.webm,-0.375,0.8521885643856908,-0.06181019400814246
P_01/1491423217564_18_-study-benefits_of_running_writing.webm,-0.3125,0.9524443413560993,0.7596014370434377
P_01/1491423217564_24_-study-educational_advantages_of_social_networking_sites_writing.webm,-0.3125,0.9503417061171937,0.8165690282856981
P_01/1491423217564_26_-study-where_to_find_morel_mushrooms.webm,-0.125,0.921692339391273,0.2709434776451592
P_01/1491423217564_28_-study-where_to_find_morel_mushrooms.webm,-0.125,0.8522271771762648,0.040775224395260955
P_01/1491423217564_29_-study-where_to_find_morel_mushrooms_writing.webm,-0.3125,0.8765626475316824,0.03909061950662973
P_01/1491423217564_31_-study-tooth_abscess.webm,-0.375,0.987180075796582,0.6494914114143876
P_01/1491423217564_33_-study-tooth_abscess_writing.webm,-0.3125,0.7448085560814153,0.530272906943214
@darraghdog Hi Darragh, thanks for doing this. I'm currently trying to add your alignment offsets to the dataset. How did you end up doing so on your end? Did you modify the scripts or the .json files?
@darraghdog Hi Darragh, I ended up changing the scripts so they factor in the offsets when matching the frame timestamp to the Tobii prediction timestamp, i.e. while ... frameTimeEpoch + frameTimeEpochCorrection - p.tobiiList[p.tobiiListPos].timestamp > 0
.
To clarify, would frameshift=-0.125
mean that we add -125ms to the epoch at recording start
to get the corresponding Tobii timestamp, or would we subtract -125ms? Also, what type of correlation did you calculate between the MPIIGaze and Tobii predictions? Thanks, and happy new year!
I am doing some check on the data before loading - using the same methodology as the
webgazerExtractServer.py
script. Below is one example where I see the annotations in the json and the video out of sync. See below - the metadata json indicates the video is 35.672910 seconds and FFMPEG returns a time of 35.23 recording. I know it is not a big difference but it is almost half a second which could cause frames to be out of synch with annotation.Also the last frame is shown to be time
pts_time:35.16
, do you have any idea where this should be matched to in the json annotations, as the video seems the recorded as35.672
seconds fromrecording start
torecording stop
.Any ideas if I miss something joining screen json metadata to the frames in the video.