citp / news-disinformation-study

A research project on how web users consume, are exposed to, and share news online.
8 stars 2 forks source link

Account for self-links and internal links when post-processing the LinkExposure data #35

Closed PranayAnchuri closed 4 years ago

PranayAnchuri commented 4 years ago

In the LinkExposure module, the site navigation links show-up in the data. These links are both visible and seen for the first time when page loads. We may need to filter these type of links (as they aren't much useful for the downstream studies)

[{"href":"https://www.nytimes.com/#site-content","size":{"width":25,"height":20}},
{"href":"https://www.nytimes.com/#site-index","size":{"width":25,"height":20}},
{"href":"https://www.nytimes.com/","size":{"width":60.383331298828125,"height":23}},
{"href":"https://www.nytimes.com/es/","size":{"width":62.383331298828125,"height":23}},
{"href":"https://cn.nytimes.com/","size":{"width":36.9666748046875,"height":23}}]
jonathanmayer commented 4 years ago

I think we should handle this in the post-processing for generating reports, rather than in the LinkExposure module. These are accurately recorded link exposures... just not exposures we happen to care about for the particular study we're currently conducting. In addition to self-links, we'll have to be careful about the broader class of internal links.

PranayAnchuri commented 4 years ago

LinkExposure accounts for internal and self links.