Feature: Direct Linking / Link to quotes

blackforestboi commented 6 years ago

As discussed with @blahah and Charlie, we will pull forward the direct linking feature. It will enable users to link to specific quotes of an article, instead of just the url. It will also provide us with an ideal viral loop to aquire users.

From my understanding today, there are 3 parts to this:

The selection and anchoring of the highlights in an article
The proxy server routing the traffic and injecting the code to highlight the specific text
The interruption for new users to expose them to Memex and its features
[Optional] the ability to search for those highlights > rank the words of the quote higher in search.

1. Selection and anchoring

Similar to how hypothes.is does it. we would have a modal popping up, whenever a user selects a piece of text. It would show > "highlight" & "link to quote".

Also we could put these options in the context menu:

Technical details

@tilgovi,@bigbluehat & @Treora rebuilt the annotator library that powers the Hypothes.is annotation software. The code will enable us to create an anchor to a piece of text. It still awaits to become an npm package, but AFAIK it is functional. @Treora also built this nifty demo to showcase its capabilities.

2. Proxy server

The way I understood it, the proxy servers role is to inject the JS code into the loaded website a user is about to visit, so that the hightlights can be displayed. Our proxy would host a DB with all the quotes that have been made. Possibly already as a dat node we can replicate among several instances..

Important here is that the link created shows the social preview data of the link destination, not the proxy. To get what I mean with that, try creating a post on social media with this link: https://hyp.is/d6o9PM67EeeGWFs6SrbDHg/www.nytimes.com/2011/10/19/nyregion/new-york-planting-a-million-treestoo-many-some-say.html

At Hypothes.is will show preview images of hypothes.is and the resulting quote, but nothing about the nytimes article.

Earlier in the product life cycle of hypothes.is this routing to the information about the original article was still in place. Now it shows the red preview of hypothes.is. I find this undesirable, as the preview on social media gives important information about the article. We already will inject some "ad like" thing (see point 3) with the proxy. No need to annoy users even more or hinder their UX.

Technical details:

This is the library built at Hypothes.is that facilitated the proxy: https://github.com/hypothesis/via Might offer some inspiration.

3. Interruption for new users to expose them to direct link features

Whenever a user clicks the links for direct linking, there is a loading screen (also hypothes.is has it). it is an ideal place to expose users to the features they could have by installing the Memex. We can use it to notify non-memex-users that they just received a direct link and that they can do this as well, and also have other features, by downloading the memex extension. Existing Memex users should not have this in-between screen anymore and instead directly see the highlight.

Design of this screen is still TBD, most importantly, we need the functionality to work.

4. Optional for later: Search highlights, show them in results list. @poltak

It would be useful to be able to search for the words of the highlights a person makes and rank the words higher. If we don't get this in in the first iteration, we should definitely store the quotes in raw somewhere, so we can re-index them later and also display them alongside the pages in the results overview.

Known challenges:

Hooking the highlights to the right url, even if it changes. For example the url could contain a query-string that changed and thus the URL is not the same anymore, even if the article is.
We need to be persistent somehow. At the very early stage this would mean we need multiple servers, that make sure, the direct links are always reachable. Long-term, we need to think about a way so that those links work independently of worldbrain providing the proxy, so that they can resolved in any case. Maybe dat offers some things there as well @blahah?

ShishKabab commented 6 years ago

I've been doing some research on the Direct Linking feature and the code that already exists. So far we wave the server created by @blahah to be found in https://github.com/WorldBrain/direct-linking-server and a direct linking client cloned and modified from the Apache incubator-annotator repo.

The current direct linking server consists of a Node.js/Express server that seems to be meant to be directly exposed to the internet. It stores and retrieves annotations from a LevelDB database, without validation as a PoC. Upon access of a direct link, it's supposed to fetch the page the original page, inject some JS code to highlight the necessary text (package for injection is present, but not implemented yet), and return this to the user.

This approach seems to be the most practical for now, since the alternatives I can think of have issues: 1) Embed external content in iFrame: not possible, since security policies would prevent us from modifying the target's HTML 2) Retrieve page client-side via XHR: would run into trouble with CORS policies.

However, this approach still has some issues to be resolved: 1) The content is served from our domain, meaning that linked resources like images, JS and CSS need to be fixed 2) Scripts may want to commmunicate with their server, which might mean we need to replace URLs in scripts 3) Back-ends of pages should be configured to allow any requests from other sites, which will break pages that are 'too dynamic'

One solution would be to serve a stripped, read-friendly version of the page, linking to the original website. However, this could introduce other problems: 1) The content owners would probably not be happy with us storing their content on our servers (see Google News problems in Spain.) 2) If content goes viral, content owners want ad revenue which requires users to directly land on their pages

The current direct linking server could be relatively easily be adapted to run on AWS EB + S3 + Lambda + API Gateway to be easily deployed and dynamically scaled to handle peak traffic. Current things to do there would be: 1) Write script to be injected 1a) Write highlighting code to be shared between front- and back-end (#301) 1b) UI to encourage user towards Memex 2) Refactor to use pluggable back-end 3) Implement S3 backend for retrieving annotations 4) Write Lambda function to validate and upload annotation 5) Write deployment procedures

Before going further into the technical details however, let's discuss here how to solve these bigger design challenges :)

ShishKabab commented 6 years ago

After a number of discussions, we've come to the conclusion that building on the design of hypothes.is for direct linking is not feasible / wise at the time being. Reasons for this are:

Due to content security policies in browsers, we'd only be able to provide a limited UX that'll be potentially confusing for the user, and potentially not beneficial to content authors (it'd break many interactive parts of websites, which are the CTAs content authors rely on to build engagement and generate conversion, and potentially also the ad click registrations.)
Even when implementing this limited UX, we'd be faced with much higher operating costs than alternatives
We'd have to design, implement and maintain infrastructure to prevent our solution to be used to access illegal content

For these reasons, we've chosen the following design:

In the initial version, the user can share direct links to only ONE highlight in the page
When viewing a direct link, the user sees 1) only the highlighted text, 2) some metadata about the page like title and logo, 3) a CTA to install Memex and have the fully integrated UX, 4) a link to the original article.
Once Memex is installed, the user will see the text directly highlighted in the target page

In future versions, we'd like to:

Show the context of the annotation (paragraph or table highlight is located in) slowly fading to the background at the edges to indicate this is a preview of the original page
Show the intro of the article

On a technical level, this means:

When a direct link is created, a (composite?) selector of annotation (tech speak for 'the exact location of the annotation in the page') along with the URL is sent to the DL server to generate a link.
The DL server 1) fetches the page 2) validates whether the annotation is valid / actually present in the page, 2) stores the page, 3) extracts metadata like title, description, logo from the page and stores it
Upon request, a client side app is fetched from S3 that fetches the annotation data from S3 and renders the preview page (allowing us to serve all data directly from S3)
TODO: Technical design of direct link with Memex installed

Potential design issues:

The CTAs to install Memex and go to the original website may conflict. A clear visual hierarchy that prioritises Memex install over going to the original website may mitigate this. Or, we can let the user navigate to the original website in an iframe.

Design todo items:

When direct linking, we don't know whether the annotation was orphaned, so maybe we should include a small text telling that when you go to the page, we don't know if that text is still there. (Nope, not pretty, so think of nicer alternative.)
When viewing an annotated page from the extension, we need to tell the user in a nice way the annotation is orphaned when that happens
Since the direct links allow for only one annotation right now, we need a clear workflow to make that direct link
Create UI mockup for direct link page (relatively short-term @oliversauter)

Coordination todo items:

Communicate with the implementor of annotation feature (#301) to include direct linking UI

More detailed breakdown of tasks and technical design issues and choices to follow later.

ShishKabab commented 6 years ago

Dump of tasks:

DL server: POST endpoint
DL server: Retrieval of external page
DL server: Extraction of metadata (title, logo, description)
DL server: Validation of annotation selector
DL server: Storage of annotation and metadata
DL server: Storage of logo (once per website)
DL server: HTML & CSS for Memex Link page
DL server: Set up build procedure of client side app
DL server: Integrate fetching of annotation, metadata and logo in client side app
DL server: Set up redirecting all links to client side app
DL server: Set up deployment procedures
DL client: Detect we're trying to go to a direct link
DL client: Before going to a direct link, remeber annotation ID, redirect to target, fetch annotation in background, apply annotation once target page has loaded

blackforestboi commented 6 years ago

The DL server 1) fetches the page 2) validates whether the annotation is valid / actually present in the page, 2) stores the page, 3) extracts metadata like title, description, logo from the page and stores it

Seems like this is meant to prevent abuse, do we maybe want to do this as a further improvement, but not have it in the MVP state? I don't assume that we will get people spamming us in the beginning. And if they spam us, they produce way more computing power through us fetching the page and checking on each page if the annotation is valid, right?

ShishKabab commented 6 years ago

Yeah, let's keep this for later, good idea :)

blackforestboi commented 6 years ago

Seems to be going well :)

Some notes on things I discovered while testing that need to be done before release:

Demo

[x] use different paragraph that shows scrolling ability > now it is the top paragraph.
[x] Remove the word "left" in the left corner
[x] highlighting a piece of text shows the tooltip on the top of the page, not next to the highlight
[x] creating a second highlight in the demo shows the tooltip with the old link
[x] Highlight the sentences, not the whole block > stop at the last letter.

With iframe

[x] copy to clipboard on "copy quote & go to page" button doesnt work
[x] In full-screen show only "copy quote", not "copy quote & go to page", and don't redirect to the original page
[x] More space to the right of the Memex ad
[x] More space to the right of the quote and between the button
[x] Remove the title and url in fullscreen, only show quote
[x] Mac/win detection does not work > shows me ctrl+F
[x] Removing the whitespace on the bottom

Without Iframe

[x] copy to clipboard on "copy quote & go to page" button doesnt work

ShishKabab commented 6 years ago

The whitespace on the bottom unfortunately is a trade-off. It's either that, or having two scroll bars, which is in my opinion more confusing.

blackforestboi commented 6 years ago

The whitespace on the bottom unfortunately is a trade-off. It's either that, or having two scroll bars, which is in my opinion more confusing.

Ok.

Some other things I noticed:

[x] the lines above und under "or" are not aligned
[x] the link does not scroll to the annotation in the demo
[x] related to the last one, that is why not the first one should be highlighted, but rather one that is somewhere lower, so people see the scrolling ability.
[x] The link in the popup should be copied to clipboard, not a (hidden behind the text) link. > to augment the real workflow.
[x] When trying to triple click to mark the whole paragraph, the popup scoops in between and highlights itself :)

blackforestboi commented 6 years ago

Got a few other feedbacks on the direct linking:

[x] the button "copy quote and go to page" is not clear to people. It should just say "Go to page" and below a text, similar to the CMD+F one, that says "Quote will also be copied to clipboard to search with CMD+F"
[x] when clicking on link in iframe, go to new page if possible > dont show the bar anymore.

ShishKabab commented 6 years ago

Ah, OK. Thought we'd leave the scrolling for later ;)

Right now, I've highlighted the 3rd paragraph. Is that clear enough?

About the link being copied, remember the Slack discussion? Due to the asynchronous nature, it's not possible, or would require considerable effort + inconsistent UX. For now, we show the link (text should be URL), maybe we can also way 'press right click to copy URL. Rationale behind this was that in the demo people will want to click the link and see the result anyway, rather than sharing a useless demo link.

Hmm, the triple click thing is unfortunate. Maybe the popup should have a delay of about 100-200ms?

ShishKabab commented 6 years ago

Unfortunately, opening links in the iframe in the parent window doesn't seem possible. Normally you'd need to modify the iframe content, which we don't have access too....

blackforestboi commented 6 years ago

Found this, the second answer. Would that work? https://stackoverflow.com/questions/22808065/how-to-make-all-links-in-an-iframe-open-in-new-tab

ShishKabab commented 6 years ago

Nope, since all the answers including the second one involve modifying the iframe content :(

blackforestboi commented 6 years ago

About the link being copied, remember the Slack discussion? Due to the asynchronous nature, it's not possible, or would require considerable effort + inconsistent UX

Yeah, i meant something differently though. I meant after creating the link, showing the loading bar until the link is confirmed, then the link in the text field (instead of the text: "This is your link") Also so we dont have to have 2 different versions for the extension and the demo, I think it makes sense to keep it consistent. However I agree that for the demo, people really just want to see how it works. In the extension the use case is to have the link as soon as possible in the clipboard, not visit it though. When you click on the link it would then show a confirmation inside the modal "Copied to clipboard"

blackforestboi commented 6 years ago

Hmm, the triple click thing is unfortunate. Maybe the popup should have a delay of about 100-200ms?

yeah lets try that :) maybe also having the popup appear centered below the cursor with a little margin to the bottom.

blackforestboi commented 6 years ago

Found another issue with the demo page: If you try to create a second direct link after you did one, the page hangs itself and crashes after afterwards.

ShishKabab commented 6 years ago

Found another issue with the demo page: If you try to create a second direct link after you did one, the page hangs itself and crashes after afterwards.

Unfortunately, I can't reproduce this neither in the master branch, nor the current staging version. Could you please describe the exact steps you're taking? After the link is created, do you click to copy the URL? Do you click the popup away?

blackforestboi commented 6 years ago

I tried it on http://staging.memex.link/demo/

here is a video. Today it actually got worse, It already freezes on the first try to make a direct link. http://recordit.co/4AVdoqzPC1

What I did the last time (if I remember correctly): 1) create a direct link by highlighting and clicking "create link..." 2) highlighting second paragraph/text and click direct link 3) hangs itself

Do you click the popup away?

I would expect the popup to disappear anyway if i click anything outside of it. Also I'd expect that once I have created a direct link, the piece stays highlighted, and when i click on the highlight, the popup lets me directly copy the link to clipboard (as it is already created). Much like on hypothesis. if you click on a highlight i opens the annotation in the sidebar, you would not need to re-create the same direct link for a second time.

blackforestboi commented 6 years ago

@ShishKabab @digi0ps

Here some updates on the UI/UX of the tooltip: tooltip

the grey box with the link should be a regular text field that people could also fully select by dragging dropping, this way we can display the whole link and still keep a reasonable size.
The x should close the popup once
the wrench icon should lead us to the settings panel, where we need to add a new setting to to be able to disable direct linking popups permanently
the popup also needs an additional setting to disable/enable memex links In that context i thought about how we can rebrand "direct links". I find them not very descriptive. How could we call it? Options:
Deep link (already taken)
precise link (too long, not catchy)
Direct link (also too long, not catchy
Quote Link (better in description, but excludes stuff that is not a quote)
Hypercitations (Too long, but I like the citation aspect as it incapsules the act itself > citing > and it puts together hyperlink )
Deep Citations (too long again, but better, and very distinct)
CiteLink
Memex.Link > as it would be some sort of branding

ShishKabab commented 6 years ago

Just see that a comment I've written didn't get sent correctly :(

The tooltip looks functional, but something feels off to me. Don't have the in-depth graphic knowledge and language to express this feeling concretely, but there's maybe something about the spacing and differences in line width that makes it feel kind of wrong / cobbled together.

About the name Memex.Link could be very cool. Want me to do a post on Facebook once the new demo version is working?

blackforestboi commented 6 years ago

Hey @ShishKabab @digi0ps

Awesome work with the direct linking stuff. It's taking form and I can't wait to see it in prod :P I have been testing around on the demo and the direct links and here a checklist of things that still need some work:

[ ] Demo page is not responsive on dektop and mobile:
[ ] The first highlight "It is the physicist..." Is this green block that does not look like the other highlights, also the tooltip on the right is not responsive:
[ ] The highlight green is too strong, making it more subtle would be great. so more transparent and with no need to change the font color
[x] when the direct link is created, the highlight disappears. It should stay until the tooltip disappears (thanks @ShishKabab for noting that) ;)
[x] In the top bar when receiving the direct link it should show "Link copied to clipboard" when clicking on the button.
[x] Instead of having a text based "see live demo" rather making it 2 buttons that are on one horizontal line "See Live Demo" and "Download Memex"
[x] The URL in the full-screen interlay is missing:
[ ] the brain icon in the tooltip is not vertically aligned:
[x] Adding more space above the autoscrolling position. Now it is really at the top, maybe having like 3-5cm above:
[x] When clicking on the mobile version of the interlay on "go to page" it loads the same interlay page again. It should direct me to the live version of the page. Also it does not copy the quote to the clipboard. Lastly it is not responsive yet
[x] A zip with all SVG icons for @digi0ps: Archive 2.zip

ShishKabab commented 6 years ago

The first highlight "It is the physicist..." Is this green block that does not look like the other highlights

That was to cover the case where the user selects some text inside that region, remember?

digi0ps commented 6 years ago

Have made the demo page responsive for mobile, forgot to do the "Someone wants to show you" page. Will make everything responsive.
The page gets good after a width of 1050px, and the mobile version comes at below 500px. At tablet sizes ( 500px < 1000px ) it gets congested. Should I keep the mobile design or compress the desktop design?
when the direct link is created, the highlight disappears. It should stay until the tooltip disappears I don't think this can be prevented, since copying creates a new selection! Is it possible to keep it highlighted @ShishKabab ?
Thanks for the icons!

blackforestboi commented 6 years ago

That was to cover the case where the user selects some text inside that region, remember?

No, help me out :)

I don't think this can be prevented, since copying creates a new selection! Is it possible to keep it highlighted @ShishKabab ?

Yeah we should have the highlights stick.

Ran into some issues with the anchoring:

When visiting this article I can create a memex.link but it does not reanchor when opening this link. It should highlight this paragraph: Engineers anticipated this convergence. As early as 1967, one of the key architects of the system for exchanging small packets of data that gave birth to the Internet, Paul Baran, predicted the rise of a centralized “computer utility” that would offer computing much the same way that power companies provide electricity. Today, that model is largely embodied by the information empires of Amazon, Google, and other cloud-computing companies. Like Baran anticipated, they offer us convenience at the expense of privacy.
The tooltip is styled differently on different pages: Highscalability.com:
newyorker.com (looks fine all the way):

https://hacks.mozilla.org/

digi0ps commented 6 years ago

@ShishKabab fixed the highlight after copy issue. I will look into why the tooltip is experiencing spacing issues tomorrow. Probably it's some global css override.

blackforestboi commented 6 years ago

Quick info to @ShishKabab:

The scrolling position in Memex should also have the same spacing to the top than the demo.

blackforestboi commented 6 years ago

@ShishKabab Found a bug: When going to this paragraph in the nytimes article, it only shows you half of the paragraph highlighted. Make sure you have a version with the direct linking ui implemented to test. Here a prepacked extension: extension.zip

Now, try to highlight the whole paragraph, create a link and open it. It somehow doesnt take all of the paragraph highlighted. Somehow alway cut off.

EDIT: happens also on this site

ShishKabab commented 6 years ago

Looked in more into the bug. Your first bug is reproducible, didn't try the second case. The selector is generated correctly, with the right quote prefix and suffix, but it doesn't anchor correctly. We need someone to look into the problem using this bug as a reference.

To hack on the problem, one has to check out the feature/direct-linking-ui branch of this GitHub repo, with the main code of interest to you living in in src/direct-linking/content_script/annotations.js. There's this bug to solve, and also the case of quotes being anchored in <script> tags, so if someone could modify the anchoring function to return all locations which we can then filter out ourselves, that'd be awesome :)

blackforestboi commented 6 years ago

@Treora, @tilgovi, @bigbluehat do you have any ideas on how to solve this?

blackforestboi commented 6 years ago

@ShishKabab To another note, i discovered a really annoying work-flow bug: http://recordit.co/2ecMnz1ptx

When selecting a piece of text, I cant also use the context menu when right clicking. It will then select the word I right clicked on, not apply the context menu to the whole previous selection. When I remove the tooltip with the small x I then lose my selection. (as all seen in the video)

EDIT: even more annoyingly it does not improve when I disable the tooltip via the popup.

tilgovi commented 6 years ago

I'm having trouble following the thread, but happy to look into a text quote anchoring bug.

However, for the case of "quotes being anchored in <script> tags", I do not consider that a bug. While semantically, for a human, this may not make much sense, it avoids any determination about what is visible / invisible or part of / not part of "the text". Such determinations can be complicated and may perform poorly. As a low-level library, dom-anchor-text-quote will not make any such determination. If you wish to anchor only to "the text" of a document, you can find or write tools to extract the body content and then use the dom-anchor-* libraries on only the root elements you want to treat as having text content.

blackforestboi commented 6 years ago

Thanks @tilgovi, indeed it was only for the anchoring bug. I was not precise enough. Thanks for the quick reply.

tilgovi commented 6 years ago

You were very clear, I was just trying to be clear, too :).

If there's an anchoring bug I can help with let me know how to reproduce it, @ShishKabab.

blackforestboi commented 6 years ago

@tilgovi

To reproduce: 1) Install this extension in chrome extension 2.zip 2) Highlight a paragraph here: https://www.nytimes.com/2011/10/19/nyregion/new-york-planting-a-million-treestoo-many-some-say.html 3) "Create Link" 4) Open Link in new tab 5) See that the Highlight is not completely finished, sometimes half the paragraph, sometimes just a few words.

Does that help to reproduce it? For more technical assistance, yes @ShishKabab is the best person to ask :)

tilgovi commented 6 years ago

Well, I can maybe get to that eventually, but it would be far more helpful if @ShishKabab can diagnose a scenario that causes this and make a small repro test for the dom-anchor-text-quote or dom-anchor-text-position test suites! I'll wait a bit to hear back.

ShishKabab commented 6 years ago

Hi @tilgovi thanks for getting back to us! Today I'll see if I can get a PR together with a test reproducing the anchoring issues. As for the Githubissues.

Githubissues is a development platform for aggregating issues.

WorldBrain / Memex