documentcloud / documentcloud

The DocumentCloud platform
https://www.documentcloud.org
MIT License
424 stars 162 forks source link

Note/Page hits should not count towards a document's `detected_remote_url` #343

Open knowtheory opened 8 years ago

knowtheory commented 8 years ago

DocumentCloud infers the canonical location of a document on the basis of where views take place (so that traffic can be redirected to those locations).

Currently the detected_remote_url is calculated based on total detected hits, rather than just those from a document viewer.

knowtheory commented 8 years ago

For additional context and background...

DocumentCloud provides a series of embedded components for journalists/reporters/producers to embed on their pages. We then drop a transparent 1x1 tracking pixel when the component loads in order to tally the views.

We use the tallied view/page load data to infer a document's canonical embedded location (detected_remote_url). Then, DocumentCloud does it's best to link users to that canonical embedded location rather. For example, if a journalist has embedded both a note (for emphasis or to refer to a specific passage) and then also embeds the document on a separate page, the user should be sent to the location of the embedded page, when they click through on the note.

Unfortunately, we're not quite tallying this correctly, and all embedded views are currently being used to calculate the detected remote url for a document.

So, in the event that a journalist only embeds notes from a document, the platform tallies those note views, and incorrectly infers that the document is embedded on the page hosting the notes. Consequently, clicking through on the notes redirects the user back to the same page they are on.