WorldBrain / Memex

Browser extension to curate, annotate, and discuss the most valuable content and ideas on the web. As individuals, teams and communities.
https://worldbrain.io
4.4k stars 335 forks source link

Feature: Direct Linking / Link to quotes #236

Closed blackforestboi closed 6 years ago

blackforestboi commented 6 years ago

As discussed with @blahah and Charlie, we will pull forward the direct linking feature. It will enable users to link to specific quotes of an article, instead of just the url. It will also provide us with an ideal viral loop to aquire users.

From my understanding today, there are 3 parts to this:

  1. The selection and anchoring of the highlights in an article
  2. The proxy server routing the traffic and injecting the code to highlight the specific text
  3. The interruption for new users to expose them to Memex and its features
  4. [Optional] the ability to search for those highlights > rank the words of the quote higher in search.

1. Selection and anchoring

Similar to how hypothes.is does it. we would have a modal popping up, whenever a user selects a piece of text. It would show > "highlight" & "link to quote".

Also we could put these options in the context menu:

Technical details

@tilgovi,@bigbluehat & @Treora rebuilt the annotator library that powers the Hypothes.is annotation software. The code will enable us to create an anchor to a piece of text. It still awaits to become an npm package, but AFAIK it is functional. @Treora also built this nifty demo to showcase its capabilities.

2. Proxy server

The way I understood it, the proxy servers role is to inject the JS code into the loaded website a user is about to visit, so that the hightlights can be displayed. Our proxy would host a DB with all the quotes that have been made. Possibly already as a dat node we can replicate among several instances..

Important here is that the link created shows the social preview data of the link destination, not the proxy. To get what I mean with that, try creating a post on social media with this link: https://hyp.is/d6o9PM67EeeGWFs6SrbDHg/www.nytimes.com/2011/10/19/nyregion/new-york-planting-a-million-treestoo-many-some-say.html

At Hypothes.is will show preview images of hypothes.is and the resulting quote, but nothing about the nytimes article.

Earlier in the product life cycle of hypothes.is this routing to the information about the original article was still in place. Now it shows the red preview of hypothes.is. I find this undesirable, as the preview on social media gives important information about the article. We already will inject some "ad like" thing (see point 3) with the proxy. No need to annoy users even more or hinder their UX.

Technical details:

This is the library built at Hypothes.is that facilitated the proxy: https://github.com/hypothesis/via Might offer some inspiration.

3. Interruption for new users to expose them to direct link features

Whenever a user clicks the links for direct linking, there is a loading screen (also hypothes.is has it). it is an ideal place to expose users to the features they could have by installing the Memex. We can use it to notify non-memex-users that they just received a direct link and that they can do this as well, and also have other features, by downloading the memex extension. Existing Memex users should not have this in-between screen anymore and instead directly see the highlight.

Design of this screen is still TBD, most importantly, we need the functionality to work.

4. Optional for later: Search highlights, show them in results list. @poltak

It would be useful to be able to search for the words of the highlights a person makes and rank the words higher. If we don't get this in in the first iteration, we should definitely store the quotes in raw somewhere, so we can re-index them later and also display them alongside the pages in the results overview.

Known challenges:

  1. Hooking the highlights to the right url, even if it changes. For example the url could contain a query-string that changed and thus the URL is not the same anymore, even if the article is.
  2. We need to be persistent somehow. At the very early stage this would mean we need multiple servers, that make sure, the direct links are always reachable. Long-term, we need to think about a way so that those links work independently of worldbrain providing the proxy, so that they can resolved in any case. Maybe dat offers some things there as well @blahah?
ShishKabab commented 6 years ago

I've been doing some research on the Direct Linking feature and the code that already exists. So far we wave the server created by @blahah to be found in https://github.com/WorldBrain/direct-linking-server and a direct linking client cloned and modified from the Apache incubator-annotator repo.

The current direct linking server consists of a Node.js/Express server that seems to be meant to be directly exposed to the internet. It stores and retrieves annotations from a LevelDB database, without validation as a PoC. Upon access of a direct link, it's supposed to fetch the page the original page, inject some JS code to highlight the necessary text (package for injection is present, but not implemented yet), and return this to the user.

This approach seems to be the most practical for now, since the alternatives I can think of have issues: 1) Embed external content in iFrame: not possible, since security policies would prevent us from modifying the target's HTML 2) Retrieve page client-side via XHR: would run into trouble with CORS policies.

However, this approach still has some issues to be resolved: 1) The content is served from our domain, meaning that linked resources like images, JS and CSS need to be fixed 2) Scripts may want to commmunicate with their server, which might mean we need to replace URLs in scripts 3) Back-ends of pages should be configured to allow any requests from other sites, which will break pages that are 'too dynamic'

One solution would be to serve a stripped, read-friendly version of the page, linking to the original website. However, this could introduce other problems: 1) The content owners would probably not be happy with us storing their content on our servers (see Google News problems in Spain.) 2) If content goes viral, content owners want ad revenue which requires users to directly land on their pages

The current direct linking server could be relatively easily be adapted to run on AWS EB + S3 + Lambda + API Gateway to be easily deployed and dynamically scaled to handle peak traffic. Current things to do there would be: 1) Write script to be injected 1a) Write highlighting code to be shared between front- and back-end (#301) 1b) UI to encourage user towards Memex 2) Refactor to use pluggable back-end 3) Implement S3 backend for retrieving annotations 4) Write Lambda function to validate and upload annotation 5) Write deployment procedures

Before going further into the technical details however, let's discuss here how to solve these bigger design challenges :)

ShishKabab commented 6 years ago

After a number of discussions, we've come to the conclusion that building on the design of hypothes.is for direct linking is not feasible / wise at the time being. Reasons for this are:

For these reasons, we've chosen the following design:

In future versions, we'd like to:

On a technical level, this means:

Potential design issues:

Design todo items:

Coordination todo items:

More detailed breakdown of tasks and technical design issues and choices to follow later.

ShishKabab commented 6 years ago

Dump of tasks:

blackforestboi commented 6 years ago

The DL server 1) fetches the page 2) validates whether the annotation is valid / actually present in the page, 2) stores the page, 3) extracts metadata like title, description, logo from the page and stores it

Seems like this is meant to prevent abuse, do we maybe want to do this as a further improvement, but not have it in the MVP state? I don't assume that we will get people spamming us in the beginning. And if they spam us, they produce way more computing power through us fetching the page and checking on each page if the annotation is valid, right?

ShishKabab commented 6 years ago

Yeah, let's keep this for later, good idea :)

blackforestboi commented 6 years ago

Seems to be going well :)

Some notes on things I discovered while testing that need to be done before release:

Demo

With iframe

Without Iframe

ShishKabab commented 6 years ago

The whitespace on the bottom unfortunately is a trade-off. It's either that, or having two scroll bars, which is in my opinion more confusing.

blackforestboi commented 6 years ago

The whitespace on the bottom unfortunately is a trade-off. It's either that, or having two scroll bars, which is in my opinion more confusing.

Ok.

Some other things I noticed:

blackforestboi commented 6 years ago

Got a few other feedbacks on the direct linking:

ShishKabab commented 6 years ago

Ah, OK. Thought we'd leave the scrolling for later ;)

Right now, I've highlighted the 3rd paragraph. Is that clear enough?

About the link being copied, remember the Slack discussion? Due to the asynchronous nature, it's not possible, or would require considerable effort + inconsistent UX. For now, we show the link (text should be URL), maybe we can also way 'press right click to copy URL. Rationale behind this was that in the demo people will want to click the link and see the result anyway, rather than sharing a useless demo link.

Hmm, the triple click thing is unfortunate. Maybe the popup should have a delay of about 100-200ms?

ShishKabab commented 6 years ago

Unfortunately, opening links in the iframe in the parent window doesn't seem possible. Normally you'd need to modify the iframe content, which we don't have access too....

blackforestboi commented 6 years ago

Found this, the second answer. Would that work? https://stackoverflow.com/questions/22808065/how-to-make-all-links-in-an-iframe-open-in-new-tab

ShishKabab commented 6 years ago

Nope, since all the answers including the second one involve modifying the iframe content :(

blackforestboi commented 6 years ago

About the link being copied, remember the Slack discussion? Due to the asynchronous nature, it's not possible, or would require considerable effort + inconsistent UX

Yeah, i meant something differently though. I meant after creating the link, showing the loading bar until the link is confirmed, then the link in the text field (instead of the text: "This is your link") Also so we dont have to have 2 different versions for the extension and the demo, I think it makes sense to keep it consistent. However I agree that for the demo, people really just want to see how it works. In the extension the use case is to have the link as soon as possible in the clipboard, not visit it though. When you click on the link it would then show a confirmation inside the modal "Copied to clipboard"

blackforestboi commented 6 years ago

Hmm, the triple click thing is unfortunate. Maybe the popup should have a delay of about 100-200ms?

yeah lets try that :) maybe also having the popup appear centered below the cursor with a little margin to the bottom.

blackforestboi commented 6 years ago

Found another issue with the demo page: If you try to create a second direct link after you did one, the page hangs itself and crashes after afterwards.

ShishKabab commented 6 years ago

Found another issue with the demo page: If you try to create a second direct link after you did one, the page hangs itself and crashes after afterwards.

Unfortunately, I can't reproduce this neither in the master branch, nor the current staging version. Could you please describe the exact steps you're taking? After the link is created, do you click to copy the URL? Do you click the popup away?

blackforestboi commented 6 years ago

I tried it on http://staging.memex.link/demo/

here is a video. Today it actually got worse, It already freezes on the first try to make a direct link. http://recordit.co/4AVdoqzPC1

What I did the last time (if I remember correctly): 1) create a direct link by highlighting and clicking "create link..." 2) highlighting second paragraph/text and click direct link 3) hangs itself

Do you click the popup away?

I would expect the popup to disappear anyway if i click anything outside of it. Also I'd expect that once I have created a direct link, the piece stays highlighted, and when i click on the highlight, the popup lets me directly copy the link to clipboard (as it is already created). Much like on hypothesis. if you click on a highlight i opens the annotation in the sidebar, you would not need to re-create the same direct link for a second time.

blackforestboi commented 6 years ago

@ShishKabab @digi0ps

Here some updates on the UI/UX of the tooltip: tooltip

ShishKabab commented 6 years ago

Just see that a comment I've written didn't get sent correctly :(

The tooltip looks functional, but something feels off to me. Don't have the in-depth graphic knowledge and language to express this feeling concretely, but there's maybe something about the spacing and differences in line width that makes it feel kind of wrong / cobbled together.

About the name Memex.Link could be very cool. Want me to do a post on Facebook once the new demo version is working?

blackforestboi commented 6 years ago

Hey @ShishKabab @digi0ps

Awesome work with the direct linking stuff. It's taking form and I can't wait to see it in prod :P I have been testing around on the demo and the direct links and here a checklist of things that still need some work:

ShishKabab commented 6 years ago

The first highlight "It is the physicist..." Is this green block that does not look like the other highlights

That was to cover the case where the user selects some text inside that region, remember?

digi0ps commented 6 years ago

image

blackforestboi commented 6 years ago

That was to cover the case where the user selects some text inside that region, remember?

No, help me out :)

I don't think this can be prevented, since copying creates a new selection! Is it possible to keep it highlighted @ShishKabab ?

Yeah we should have the highlights stick.

Ran into some issues with the anchoring:

digi0ps commented 6 years ago

@ShishKabab fixed the highlight after copy issue. I will look into why the tooltip is experiencing spacing issues tomorrow. Probably it's some global css override.

blackforestboi commented 6 years ago

Quick info to @ShishKabab:

The scrolling position in Memex should also have the same spacing to the top than the demo.

blackforestboi commented 6 years ago

@ShishKabab Found a bug: When going to this paragraph in the nytimes article, it only shows you half of the paragraph highlighted. Make sure you have a version with the direct linking ui implemented to test. Here a prepacked extension: extension.zip

Now, try to highlight the whole paragraph, create a link and open it. It somehow doesnt take all of the paragraph highlighted. Somehow alway cut off.

EDIT: happens also on this site

ShishKabab commented 6 years ago

Looked in more into the bug. Your first bug is reproducible, didn't try the second case. The selector is generated correctly, with the right quote prefix and suffix, but it doesn't anchor correctly. We need someone to look into the problem using this bug as a reference.

To hack on the problem, one has to check out the feature/direct-linking-ui branch of this GitHub repo, with the main code of interest to you living in in src/direct-linking/content_script/annotations.js. There's this bug to solve, and also the case of quotes being anchored in <script> tags, so if someone could modify the anchoring function to return all locations which we can then filter out ourselves, that'd be awesome :)

blackforestboi commented 6 years ago

@Treora, @tilgovi, @bigbluehat do you have any ideas on how to solve this?

blackforestboi commented 6 years ago

@ShishKabab To another note, i discovered a really annoying work-flow bug: http://recordit.co/2ecMnz1ptx

When selecting a piece of text, I cant also use the context menu when right clicking. It will then select the word I right clicked on, not apply the context menu to the whole previous selection. When I remove the tooltip with the small x I then lose my selection. (as all seen in the video)

EDIT: even more annoyingly it does not improve when I disable the tooltip via the popup.

tilgovi commented 6 years ago

I'm having trouble following the thread, but happy to look into a text quote anchoring bug.

However, for the case of "quotes being anchored in <script> tags", I do not consider that a bug. While semantically, for a human, this may not make much sense, it avoids any determination about what is visible / invisible or part of / not part of "the text". Such determinations can be complicated and may perform poorly. As a low-level library, dom-anchor-text-quote will not make any such determination. If you wish to anchor only to "the text" of a document, you can find or write tools to extract the body content and then use the dom-anchor-* libraries on only the root elements you want to treat as having text content.

blackforestboi commented 6 years ago

Thanks @tilgovi, indeed it was only for the anchoring bug. I was not precise enough. Thanks for the quick reply.

tilgovi commented 6 years ago

You were very clear, I was just trying to be clear, too :).

If there's an anchoring bug I can help with let me know how to reproduce it, @ShishKabab.

blackforestboi commented 6 years ago

@tilgovi

To reproduce: 1) Install this extension in chrome extension 2.zip 2) Highlight a paragraph here: https://www.nytimes.com/2011/10/19/nyregion/new-york-planting-a-million-treestoo-many-some-say.html 3) "Create Link" 4) Open Link in new tab 5) See that the Highlight is not completely finished, sometimes half the paragraph, sometimes just a few words.

Does that help to reproduce it? For more technical assistance, yes @ShishKabab is the best person to ask :)

tilgovi commented 6 years ago

Well, I can maybe get to that eventually, but it would be far more helpful if @ShishKabab can diagnose a scenario that causes this and make a small repro test for the dom-anchor-text-quote or dom-anchor-text-position test suites! I'll wait a bit to hear back.

ShishKabab commented 6 years ago

Hi @tilgovi thanks for getting back to us! Today I'll see if I can get a PR together with a test reproducing the anchoring issues. As for the Githubissues.

  • Githubissues is a development platform for aggregating issues.