How to store and present rich content?

jdickey commented 8 years ago

After Commit a3dd6df, which produced Gem release 0.4.0, we get to the first "really interesting" (partial) use case, described in the "User-Eye View" section of Issue #1, as

designate any single contiguous subset of the content of [an] article as the target of a Contribution proposal

Before we can implement this, we really ought to nail down our rich-content support. "Rich content", in our usage, refers to content that may be entered using Common Markdown markup (or as HTML, which is a subset of Markdown), and then stored as HTML. Specifically, as valid XML conforming to HTML5 usage.

Why store as HTML? While our initial delivery mechanism is a Web app (presenting content as HTML), that "ordinarily" wouldn't be a strong enough argument. What makes it compelling, however, is that our delivery app will provide a visual interface where users select content within an Article by dragging their pointing device over Article content rendered as HTML. The selected (HTML) content may consist of content that occurs in multiple places within a block of content (think of the English word the in a long article body), and thus some disambiguation is needed. This might be achieved by specifying the initial offset into the content of the selection, or by specifying the content before or after the selection. This still might not lock us into HTML, until we consider the next question: how do we identify start- and endpoints of ranges (e.g., contribution boundaries) within content? HTML offers the anchor tag with an id attribute (formerly a name attribute) which can be used to mark boundaries of a range. The CommonMark spec as of Version 0.24 (12 January 2016) does not appear to define any equivalent in Markdown, and none of the Markdown-to-HTML converters we've investigated support these named anchors, or any apparent equivalent. It would therefore appear that we are locked into HTML for content against which Contributions may be proposed (currently limited to Article body content).

Other potential rich content, including Member profiles and Contribution proposed content, may be persisted using as-entered Markdown as no non-visible markers need be supported.

Why not simply enter and store as raw, unformatted text? Aesthetic considerations aside, we still have the same issues with identifying ranges transparently within the content. HTML appears to provide the least obviously bad solution here.

Thoughts?

mitpaladin commented 8 years ago

Does the anchor tag work across paragraphs of text?

jdickey commented 8 years ago

The point of using identified (formerly named) anchors is that they can be used to define points in an HTML stream. (Anchor tags are required to be closed; however, they can contain empty content). So, for instance, we could have content like

<p>
  ...blah, blah, blah...<a id="selection-1-begin"></a>This is the beginning of interesting content. More interesting content...
</p>
<p>
  Even more interesting content...
  The end of the content we're interested in here.<a id="selection-1-end"></a> More content...
</p>

Each shown anchor tag is properly closed, as the standard requires, but has empty content, as the standard allows. Each identifier (such as selection-1-begin) must be unique within the current document, but need not be unique globally within all documents. This also lets us deal with overlapping ranges, should that ever become necessary, as each begin- or endpoint is defined as a single point within the content.

Does that answer your question?

mitpaladin commented 8 years ago

cool :)

jdickey commented 8 years ago

Content Entry, Storage, and Selection: A Primer

The content originally in this comment has been moved to a Wiki page, where specific points within it may be referenced more easily.

jdickey commented 8 years ago

One thing we were probably forgetting in the preceding manifesto was validation within the Selection Service.

It should verify that the supplied endpoints

specify a valid range within the content, that
does not overlap an existing, unresolved Contribution proposal

If either of these validations fail,

a suitable error indication should be added to the :errors attribute on the selection-service object;
the "reassembled-body" attribute contains the original-as-called body content; and
the selected-markup attribute is set to an empty string.

Important note: There has heretofore been an implicit assumption that a zero-length selection is invalid; these error semantics reify that assumption.

jdickey commented 8 years ago

We've thought long and hard about how to extract a valid (and therefore parseable by Ox) HTML document fragment from an arbitrary selection, which may begin and end in any text node or element in the body text. Though this is what was attempted in previous prototypes, our new way of marking fragment begin- and endpoints likely renders this unnecessary. All that is required is for the spec for the Selection Identification Service and for the Selection Service to be followed, and markers added.

Likewise, verifying that the selected range "does not overlap an existing, unresolved Contribution proposal" (as specified immediately prior is out of scope; that is the responsibility of the Contribution-implementing classes ("client classes") which make use of this class. The only error condition tested for here shall be whether the specified endpoints constitute a valid range within the content; i.e., that the begin- and end-selection points are within the actual content.

This simplifies things greatly, obviously; for those protesting "hang on; aren't you updating the Article passed in", no, we're not; we're making a copy of the supplied body with newly-embedded markers within a container that's made available in an attribute on the implementing class; no change is made directly to the Article or its contents by this class. It's up to the client classes to do non-range-related validation and to use the generated body-content-with-markers and/or error notifications as appropriate.

Stress kills, as does haste. Speed, in and of itself, is nonlethal — just beware of sudden stops.

jdickey commented 8 years ago

In terms of validation, we particularly have no idea how to feasibly check whether the endpoints are within the text of an HTML tag (such as <section> or </table>) without either banning the < and > characters from legitimate content or iteratively walking the (parsed) DOM tree, building the parsed content up to that point, and seeing if we just passed an endpoint. We've tried that before, and no, thank you. We'll trust the caller until and unless proven otherwise.

jdickey commented 8 years ago

Item 2 in the spec for the Selection Service reads

queries the data-contribution-counter attribute on the content container (see Markdown-to-HTML Conversion Service, above), and increments the retrieved value;

This is very presentation-oriented. Instead, we should inject a :contribution_counter parameter to #call that is used to populate an attr_reader that is then incremented after validation succeeds.

The fact that nobody caught and commented on this in four days is fscking disturbing. Et tu, "collaborators"? Why have I been working 12- to 16-hour days, six or seven days a week, for four bloody years, if I'm the only one paying attention here?!?

mitpaladin commented 8 years ago

:P you're right, sorry, missed it

to get around the difficulties of walking the DOM tree to check for endpoints, would it be possible to instead, have a separate validation step that checks the selection as a text string? Then just check if there are any selection%begin selection%end in the string. I can see how tree iteration would be required within the DOM while building the selected content, but how about if we check AFTER its built and BEFORE showing users the contribution interface?

mitpaladin commented 8 years ago

This would have an equivalent check upon contribution submission, where we make sure the user has not manually entered something in the format .

jdickey commented 8 years ago

Maybe after 0.5 (if there is an "after 0.5"), we can worry about people hand-editing dangerous content. We're supposed to have been a startup, which has historically been defined by "what gets cash in the door soonest and in a survivable manner?" Somewhere along the line we turned from a startup into a one-man research project working with imaginary collaborators, and that's not the same thing at all.

mitpaladin commented 8 years ago

agreed, not a must have for 0.5 but does the approach sound right?

jdickey commented 8 years ago

…and I just saw your 'sorry, missed it' comment, the first refresh had just the "This would have an equivalent…" comment. Yeah, that sounds reasonable; we probably should open a reminder issue assigned to The Glorious Future for that.

jdickey commented 8 years ago

Now that the Selection Service form object has been completed (as of Commit 07db7f6 if not ae0fa8a), that does most of the "heavy lifting" for the use case.

Two important caveats need to be taken into consideration, however:

Building on the definition of the Selection Identification Service, the endpoints specified to the form object (via SelectionService#call) must be relative to the start of the Article body HTML markup, not its text or Markdown content. This will bite you in Sensitive Places if you get it wrong;
We've explicitly punted responsibility for verifying that the selected markup does not contain any proposed-Contribution markers up to the invoking use cases (e.g., the future ProposeEditContribution) for at least two reasons. First, obviously, is that we really do think that's legitimately out of the current scope, and also, not so obviously, is that we can foresee lots of conditionals being evaluated based on what type of new contribution is being proposed and what type(s) of existing proposal(s) occur within the segment. (If the user wishes to propose a Challenge to content overlapping that included in an existing Query proposal, might that not be allowed?) In the absence of an obvious, obviously correct one-size-fits-all strategy, we punt.

On to reassembling the body. Dr Frankenstein, please answer your page.

jdickey commented 8 years ago

Speaking of "too presentation-oriented", it shouldn't be this service's job to wrap the reassembled body with a container div; that's been deleted from the spec for the Selection Service.

TheProlog / prolog-use_cases

How to store and present rich content? #16

Content Entry, Storage, and Selection: A Primer