WebMemex / webmemex-extension

📇 Your digital memory extension, as a browser extension
https://webmemex.org
Other
208 stars 45 forks source link

Making precise links when copying content #77

Closed Treora closed 6 years ago

Treora commented 7 years ago

Instead of just linking to a whole document, people should be able to link to a specific piece of text (or any content, ideally) in that document. The recent Web Annotation standard provides a way to create such precise references, and copy&paste and drag&drop would be good interactions to create such links with.

The most obvious situation in which such a link should be created is when one quotes a piece of text in their notes. That quote should automatically remember the document it was quoted from, and its location within that document. (try ask Ted Nelson whether copy and paste ought to mean more than just replicating selected contents)

It would be a nice step to develop this in a browser extension. This would integrate very nicely with the other features of the WebMemex, but observe that it can really be considered a separate feature, that could be bundled as a separate browser extension, or that browsers (or document viewers more generally) could support natively at some moment. Development can be done in a separate repo or organisation (with Apache Annotator as most likely option).

I have in mind now to create a script that could run as a browser extension's content script on any web page, and could possibly also be added to any webpage, doing the following:

  1. Listen to events for copy and for drag actions.
  2. Create an oa:ResourceSelection object pointing to the selected content that is to be copied/dragged.
  3. Modify the DataTransfer object to add this selector as metadata to the html representation of the content.
  4. Possibly also change the html such to format the selection as a quote, and/or cite its source human-readably, and/or make it a link.

Let's go through each step/aspect in more detail.

Step 1

Should be easy, I suppose. I have no experience with this myself though, but for modern browsers I assume the APIs have enough support.

Step 2

Creating the ResourceSelection is the probably largest task, but the hardest work is hopefully already done by existing libraries (e.g. @tilgovi's dom-anchor-* libraries).

There are two aspects to it: one is to refer to the document, and one is to refer to the selection within that document. For the selection, I suppose a combination of applicable selector types may be helpful; this could start with e.g. a QuoteSelector and improve later on.

Then about the referring to the document (the 'state'). To avoid link rot and point to the exact version that was quoted, we want the reference to point not only to a URL where the document may once have been served from, but also to a more robust identifier of the document. In the WebMemex, this would go nicely together with the personal archive that one creates while browsing; the idea is that the browser creates a URL for every version of every page the user visits. The precise link could refer to this version, perhaps like in this example:

{
    "source": "http://example.org/page1",
    "state": {
      "type": "TimeState",
      "cached": "http://archive.example.org/copy1",
      "sourceDate": "2015-07-20T13:30:00Z"
    }
}

A question is still how the script should know a version-URL to refer to. We could consider reading <link> tags, if there is a standard for specifying an identifier for the current version in such a tag, but I doubt many pages provide this. When viewing a live web page in the WebMemex, we could make it get the URL of the archived version directly from another component within the extension, so we don't have to make this personal information available to the page itself. When viewing an archived page in the WebMemex things are of course much easier; the archived URL is known, and the original URL will probably be placed in a <link rel="canonical"> tag.

Step 3

The idea is to add the created oa:ResourceSelection object as metadata to the copied html (adding other mime-types may be restricted by platforms, and seems less useful). A lot is said about this in this W3 note. My thinking has been that it should be embedded in the quote, and not added in e.g. a separate piece of script, in order make it more likely that the metadata is retained when the html is processed by other applications.

The way I have in mind now is to convert the ResourceSelection into a URI's fragment identifier (see this nifty converter). This makes it degrade gracefully (the URL still refers to the document as a whole if one does not understand the more precise reference), and it enables us to insert the whole reference into an attribute of the copied html. These examples nicely convey the idea. Perhaps some RDFa vocabulary can be appropriately used here, but I am not sure how exactly.

Step 4

It would be pretty and helpful if software makes it as easy as possible for people to cite their sources, also if they e.g. copy a paragraph of a website into their email composer.

It may however be neater to have the receiving side do such a formatting transformation, but most receiving sides will not do this anytime soon (except the WebMemex's note/page editor of course). Therefore, we could try adding this on the 'sending' side instead and see if people like it.

We should remember however that a user may equally often want to copy the content as plainly that content, and not have it formatted as a quote. It would be nice to have both options available. E.g. drag&drop could create a quote while copy&paste would copy the plain content, or (preferably) in a browser extension, an entry could be added to the context menu of a selection to let one choose "Quote this selection". Another way to enable the choice is by having the formatting only add a piece of html that is easily removed again; e.g. appending a citation to it: ...copied html content...<a href="...#selector..."><cite>Some title, author</cite></a>


As a counter-part to all of this, we of course need a script that dereferences the ResourceSelections. One aspect would be to try obtain the right version of the document, and another (probably a separate script) would locate the specified selection in that document. This is outside the scope of the current issue.

I think this is most of the design and reasoning. Perhaps @tilgovi and @bigbluehat would like to improve on this? Or we may just start building and see what questions come up. :)

anona-R commented 7 years ago

I would like to try this issue @Treora @oliversauter

Treora commented 7 years ago

@uchithaR: thanks for the offer; there's no need to ask for trying things, feel free to start coding anything. I'd warn that this is quite a task though, requiring familiarity with web annotation specs etcetera.

@tilgovi: we discussed this issue quite a bit. Have you by any chance still gotten around to play with this like you were planning to?

anona-R commented 7 years ago

i will give it a try 👍

tilgovi commented 7 years ago

I have not had a chance to do any work in this direction yet.

I have started setting up the repo boilerplate over at the ASF Annotator incubating project. My intention is still to start with a simple text highlighter and a data transfer polyfill supporting a constructor.

@uchithaR one thing you might consider is looking at the new async clipboard spec. It seems to have strong support from the browser vendors: https://github.com/garykac/clipboard/blob/master/clipboard.md

Current browser implementations limit unprivileged reading/writing to the clipboard to the lifecycle of event handlers. In other words, you can write to the clipboard when the user clicks a button or issues a copy command, and you can read from the clipboard during a paste handler, but not otherwise. Another limitation is that most browsers other than Firefox will not support anything other than a few mime types in the DataTransfer object. Finally, the DataTransfer object requires all the data to be inserted synchronously, which is rather limiting as well. Ideally, a DataTransfer object could be created that allows the receiver to request data that is generated on demand.

My roadmap looks like this:

I have a tiny, poorly documented experiment where I started to think about these ideas in a Plunk (http://plnkr.co/edit/FJibLnSRElKG3k5s5jH9). The key idea here is that you can use the capture option of addEventListener to hijack the DataTransfer of the clipboard events and insert the upgraded version.

That's my brain dump for now, but don't feel compelled to stay close to my ideas if you see better directions! And if any of you would like to join the Apache Annotator project, just come say hello on the mailing list. http://annotator.apache.org/

tilgovi commented 7 years ago

Well, I just dumped all those thoughts here for lack of a better place and because I want to try to communicate more, but feel free to ignore all that complexity and focus on finding a suitable way to represent the oa:ResourceSelection inside the HTML. It's not necessary to override any existing clipboard or DataTransfer features of the browsers in order to insert custom HTML into the clipboard during a copy event.

Nevertheless, my Plunk my prove helpful for example code, as it's currently handling the creation of Web Annotation selectors and clipboard events. Right now, it's inserting the selector into a custom mime type of the DataTransfer, but if you just insert it inside the HTML representation then you don't need any of the other complexity.

tilgovi commented 7 years ago

cc @bigbluehat you should be following this, maybe :)

BigBlueHat commented 7 years ago

@tilgovi already here. 😸

Treora commented 6 years ago

Closing this issue. I would still love this feature in browsers, but will not focus on this in the near future, and if so then most likely as a separate project and separate browser extension (possibly as part of Precise Links).