CSDL-UMD / Rockwell

Rockwell uses the twitter authentication workflow to render a twitter like feed in order to collect information about the users interaction with their feed. It also has an attention check feature to ensure that the user is being observant of their feeds and not simply scrolling through with the intent of finishing quickly.
7 stars 2 forks source link

Some dwell time timestamps are missing or out of sequence #206

Open glciampaglia opened 1 year ago

glciampaglia commented 1 year ago

From the first pilot with Rockwell endless feed (which we ran in August 2023) we noticed one participant had a single missing timestamp. For now, we think this due to data being dropped by the browser of the participant (for example if browser has too many tabs open, etc.), so we will not do anything about it. If this issue appears again, we might discuss it.

saumyabhadani95 commented 1 year ago

Actually around 6-7 participants have missing timestamps (from manual analysis). Also, the missing timestamps might also be causing the timestamps to not increase monotonically (timestamp of tweet with rank r is sometimes greater than timestamp with rank r + 1 More description in Issue 2 in this link)

The most probable reason for this would be that the rank of the tweet is not correctly being calculated on the frontend. For example, its possible that when tweet with rank r is in view, the frontend incorrectly calculates that the rank of the tweet as dr which can be r+1 or r-1

To solve this, we can reduce the calculation on the frontend side and try to get more raw data. The rank of "tweet in view" is calculated using different heights like total height of the feed and height of individual tweet cards, etc. So one possible way to get the most accurate data would be to get these different heights for every scroll.

glciampaglia commented 1 year ago

We discussed this issue internally within UMD team, and we decided that we will report this finding back to the rest of the team and discuss with them further what to do. If the analysis we want to use these data for can be performed with some missing data, then we should be OK with having some of missing/inaccurate timestamps. If instead we need 100% of all timestamps to be exactly correct, then we could discuss ways to reduce the chances of either problems, for example recording the raw position data used to determine the index of the tweet in view and/or move some computation from the participants' browser to the server.

glciampaglia commented 1 year ago

We discussed the issue again, this time with the full team. The consensus was that it is OK to have some missing or inaccurate data, as long as we are aware that when we analyze them, we need to "repair" the data. This can be done in two ways:

  1. For tweet ranks that have no timestamp, we simply take the timestamp of the two closest tweets, and set the timestamp as the average of the two.
  2. For tweet ranks where the timestamp is out of sequence, we apply the same idea as above, but now the average timestamp is prepended to the out-of-sequence timestamp.

Finally, we decided that we will not make any further modification to Robert's code.