Video and annotations out of synch in webgazerExtractServer.py

darraghdog commented 4 years ago

I am doing some check on the data before loading - using the same methodology as the webgazerExtractServer.py script. Below is one example where I see the annotations in the json and the video out of sync. See below - the metadata json indicates the video is 35.672910 seconds and FFMPEG returns a time of 35.23 recording. I know it is not a big difference but it is almost half a second which could cause frames to be out of synch with annotation.

Also the last frame is shown to be time pts_time:35.16, do you have any idea where this should be matched to in the json annotations, as the video seems the recorded as 35.672 seconds from recording start to recording stop.

Any ideas if I miss something joining screen json metadata to the frames in the video.

import cv2
import glob
import pandas as pd

def dicttoDf(fpath):
    df = pd.read_json(fpath)
    _, df['user'], df['json'] = fpath.replace('.json', '').split('/')
    return df
# Read meta data
metadf = pd.concat([dicttoDf(fpath) for fpath in glob.glob('./*/*.json')]).reset_index(drop = True)

# Sample video to check
video = 'P_01/1491423217564_3_-study-dot_test.webm'
sessId =    '1491423217564_3_-study-dot_test.webm'.replace('-', '/').replace('.webm','')
# Read video
outputPrefix = '../FramesDataset/'
outDir = outputPrefix + '/' + video + "_frames" + '/'
if not os.path.isdir( outDir ):
    os.makedirs( outDir )
completedProcess = subprocess.run('ffmpeg -i "./' + video + '" -vf showinfo "' + outDir + 'frame_%08d.png"', \
                                  stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True, shell=True)
# Sort the time, and make vidtime, which is time since `recording start`
vdf = metadf[metadf.sessionId.str.contains(sessId )].sort_values('time').reset_index()
vdf['vidtime'] = vdf['time'] - vdf[vdf['type']=='recording start'].time.iloc[0]

completedProcess.stderr.split('\n')[-10:]
vdf.filter(regex='type|time|client').tail(10)

# FFMEG output
['[Parsed_showinfo_0 @ 0x7fa20b006ac0] color_range:tv color_space:bt470bg color_primaries:unknown color_trc:unknown',
 '[Parsed_showinfo_0 @ 0x7fa20b006ac0] n:1051 pts:  35100 pts_time:35.1    pos:  4783388 fmt:yuv420p sar:1/1 s:640x480 i:P iskey:0 type:P checksum:AC3C8157 plane_checksum:[56778F54 B47D13E2 DB98DE12] mean:[135 124 133 \x08] stdev:[80.9 4.6 5.1 \x08]',
 '[Parsed_showinfo_0 @ 0x7fa20b006ac0] color_range:tv color_space:bt470bg color_primaries:unknown color_trc:unknown',
 '[Parsed_showinfo_0 @ 0x7fa20b006ac0] n:1052 pts:  35160 pts_time:35.16   pos:  4792168 fmt:yuv420p sar:1/1 s:640x480 i:P iskey:0 type:P checksum:335F7A6A plane_checksum:[4C3D8967 38EF12D2 039FDE22] mean:[135 124 133 \x08] stdev:[80.9 4.6 5.1 \x08]',
 '[Parsed_showinfo_0 @ 0x7fa20b006ac0] color_range:tv color_space:bt470bg color_primaries:unknown color_trc:unknown',
 '[Parsed_showinfo_0 @ 0x7fa20b006ac0] n:1053 pts:  35160 pts_time:35.16   pos:  4794885 fmt:yuv420p sar:1/1 s:640x480 i:P iskey:0 type:P checksum:7DC46667 plane_checksum:[AFDF7652 61531277 4338DD8F] mean:[135 124 133 \x08] stdev:[80.9 4.6 5.1 \x08]',
 '[Parsed_showinfo_0 @ 0x7fa20b006ac0] color_range:tv color_space:bt470bg color_primaries:unknown color_trc:unknown',
 'frame= 1057 fps=273 q=-0.0 Lsize=N/A time=**00:00:35.23** bitrate=N/A dup=10 drop=7 speed=9.11x    ',
 'video:326457kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown',
 '']

# Meta data output
vdf.filter(regex='type|time|client').tail(10)
Out[55]: 
          time            type  clientX  clientY    vidtime
293  34952.530       mousemove   1398.0    639.0  34488.755
294  35009.930       mousemove   1397.0    640.0  34546.155
295  35130.655      mouseclick   1397.0    640.0  34666.880
296  35736.285       mousemove   1398.0    640.0  35272.510
297  35801.995       mousemove   1399.0    640.0  35338.220
298  35870.690       mousemove   1400.0    639.0  35406.915
299  35952.370       mousemove   1401.0    638.0  35488.595
300  36136.255      mouseclick   1401.0    638.0  35672.480
301  36136.685  recording stop      NaN      NaN  **35672.910**
302  36137.615             NaN      NaN      NaN  35673.840

jamestompkin commented 4 years ago

Hi there Darragh,

I looked at this a few years ago, and at the time in our dataset there was a ~per video offset required to align the interaction events with the video. It might be that the synchronization at log time can be improved; I didn't look at this back when I wrote the webgazerExtractServer.py because the data had already been collected.

From the docs: https://webgazer.cs.brown.edu/data/ "2. Watch a replay. As it processes, the system can show the interaction events against the screen recording. Note that only laptop participants have screen recording synchronization data, and only then are they roughly aligned. Use the text box to 'try some numbers' and find the sync offset. This varies per participant."

In '20180317_participant_characteristics.csv', there are two columns J & K which specify this alignment, labeled "Screen Recording Start Time (Unix milliseconds)" and "Screen Recording Start Time (Wall Clock UTC)". The alignment in the script is taken from there when the application is run, and the offset is determined from that time. Entering more accurate alignments into the .csv will remove the need for the offset.

Actually finding those offsets is probably manual work : / Obviously, we'd be happy to add them to the dataset if you went down that path!

Best wishes, James

darraghdog commented 4 years ago

Thank you James, this is very helpful. I did not use the participant_characteristics.csv file so far. Note in the version I have from a recend download this file does not have a datestamp, but I guess it is the same file. In my version, I only have the column J&K for 25 of the participants, it is blank for the others. Nevertheless, I will redo my analysis, and try to enter a start epoch for every participant (even if manual work is needed here). I will let you know how it works out... Darragh.

jamestompkin commented 4 years ago

participant_characteristics.csv: Yes, it is the same file - sorry about that confusion. The 25 participants should be the laptop participants; we were missing even coarse alignment data for the others.

Good luck! Happy to hear how you get on : )

darraghdog commented 4 years ago

Hi James,

I have worked on a rough way to get the alignment of the videos. I am working with another solution to do gaze prediction (MPIIGaze). I extracted the MPIIGaze facegaze yaw prediction for all the frames - and then looked at correlation between that value and the Tobii XGaze prediction for each frame (rolling mean averaged to 16FPS).

So I mapped the Recording Start epoch of the video (form the .json files) to the epoch in the Tobii tracker.

In this link you have a files of offsets (some of them below) where frameshift is the offset time (or the gap between the .webm file start, and the json recording start time). The corrOptimal is using this offset, what is the correlation between MPIIGaze yaw values at 16FPS vs. Tobii tracker X direction prediction at 16FPS. The corrOrig is the correlation we got before applying the offset (simply matching Recording Start in the json to .webm file start). Also, I excluded videos where the time shift was >= 1 second; or where the correlation achieved was under 0.8 - it seems it was the minority. Also, I did not include instruction videos - I plan to rework and can include them if needed.

If you have questions, let me know - maybe it helps. Feel free to close the issue, and thanks for all your work.

Best, Darragh.

(base) dhanley@Darraghs-MacBook-Pro WebGazer % head webGazerAlignment.csv 
video,frameshift,corrOptimal,corrOrig
P_01/1491423217564_10_-study-benefits_of_running.webm,-0.125,0.9490191845081442,0.37318958902769983
P_01/1491423217564_17_-study-benefits_of_running.webm,-0.375,0.8521885643856908,-0.06181019400814246
P_01/1491423217564_18_-study-benefits_of_running_writing.webm,-0.3125,0.9524443413560993,0.7596014370434377
P_01/1491423217564_24_-study-educational_advantages_of_social_networking_sites_writing.webm,-0.3125,0.9503417061171937,0.8165690282856981
P_01/1491423217564_26_-study-where_to_find_morel_mushrooms.webm,-0.125,0.921692339391273,0.2709434776451592
P_01/1491423217564_28_-study-where_to_find_morel_mushrooms.webm,-0.125,0.8522271771762648,0.040775224395260955
P_01/1491423217564_29_-study-where_to_find_morel_mushrooms_writing.webm,-0.3125,0.8765626475316824,0.03909061950662973
P_01/1491423217564_31_-study-tooth_abscess.webm,-0.375,0.987180075796582,0.6494914114143876
P_01/1491423217564_33_-study-tooth_abscess_writing.webm,-0.3125,0.7448085560814153,0.530272906943214

xanderkoo commented 3 years ago

@darraghdog Hi Darragh, thanks for doing this. I'm currently trying to add your alignment offsets to the dataset. How did you end up doing so on your end? Did you modify the scripts or the .json files?

xanderkoo commented 3 years ago

@darraghdog Hi Darragh, I ended up changing the scripts so they factor in the offsets when matching the frame timestamp to the Tobii prediction timestamp, i.e. while ... frameTimeEpoch + frameTimeEpochCorrection - p.tobiiList[p.tobiiListPos].timestamp > 0.

To clarify, would frameshift=-0.125 mean that we add -125ms to the epoch at recording start to get the corresponding Tobii timestamp, or would we subtract -125ms? Also, what type of correlation did you calculate between the MPIIGaze and Tobii predictions? Thanks, and happy new year!

brownhci / WebGazer

Video and annotations out of synch in webgazerExtractServer.py #126