Open glciampaglia opened 1 year ago
Actually around 6-7 participants have missing timestamps (from manual analysis). Also, the missing timestamps might also be causing the timestamps to not increase monotonically (timestamp of tweet with rank r is sometimes greater than timestamp with rank r + 1 More description in Issue 2 in this link)
The most probable reason for this would be that the rank of the tweet is not correctly being calculated on the frontend. For example, its possible that when tweet with rank r is in view, the frontend incorrectly calculates that the rank of the tweet as dr which can be r+1 or r-1
To solve this, we can reduce the calculation on the frontend side and try to get more raw data. The rank of "tweet in view" is calculated using different heights like total height of the feed and height of individual tweet cards, etc. So one possible way to get the most accurate data would be to get these different heights for every scroll.
We discussed this issue internally within UMD team, and we decided that we will report this finding back to the rest of the team and discuss with them further what to do. If the analysis we want to use these data for can be performed with some missing data, then we should be OK with having some of missing/inaccurate timestamps. If instead we need 100% of all timestamps to be exactly correct, then we could discuss ways to reduce the chances of either problems, for example recording the raw position data used to determine the index of the tweet in view and/or move some computation from the participants' browser to the server.
We discussed the issue again, this time with the full team. The consensus was that it is OK to have some missing or inaccurate data, as long as we are aware that when we analyze them, we need to "repair" the data. This can be done in two ways:
Finally, we decided that we will not make any further modification to Robert's code.
From the first pilot with Rockwell endless feed (which we ran in August 2023) we noticed one participant had a single missing timestamp. For now, we think this due to data being dropped by the browser of the participant (for example if browser has too many tabs open, etc.), so we will not do anything about it. If this issue appears again, we might discuss it.