Consistent valid-HTML policy for Publish New Article and Propose Edit Contribution?

jdickey commented 8 years ago

The Wiki spec for the Propose Edit Contribution use case states that, should replacing the selected (existing) content with the Proposed Content result in HTML that is not valid and well-formed, it must be rejected.

A cursory examination of the current Publish New Article use case code indicates that no validation of the body content is performed at all. This makes a certain amount of sense, as when that use case was developed, we had strong expectations of being able to use validating Markdown-to-HTML authoring of content, or at least a WYSIWYG front-end widget that would ensure valid HTML (generally by submitting it to the W3 online validator) before the use-case code got to it. Those assumptions are no longer operative; until we get a proper front-end app built (definitely post-0.5), we're not going to have that. We've established that the Markdown-to-HTML-to-Markdown conversion initially envisioned is too unreliable for use without investment of significant R&D resources that are simply not available.

Therefore, some provision for validating (and ideally repairing) content, as XML-compliant HTML5, needs to be developed and integrated into both use cases. 😩

mitpaladin commented 8 years ago

Rather than re-inventing the wheel, can we take the replacement text the user is proposing, swap it into a copy of the article, and have Markdown editor tell us whether the entire article is valid or not?

jdickey commented 8 years ago

Ummm…would you care to rephrase that? What "Markdown editor"? What I'd been planning on doing with a pure HTML workflow, in back-end code since that's all we have now is to swap the selected content with the proposed content and see if that validates. However, that's not (and shouldn't be) part of Issue #50 and PR #51, which is what I was working on when I opened this issue as a reminder. But that's still back-end code, completely decoupled from the UI.

PublishNewArticle does no validation of content at all, unlike ProposeEditContribution. This is because, when that use case was written, the presumption was that our eventual front-end app would include one of two "safety features". Either there would be some form of WYSIWYG editor which, either for Markdown or HTML, would ensure that invalid markup was cleaned up or rejected prior to sending it to the back-end use-case code. Alternatively, code could be written, ideally but not necessarily in the front-end app, that would take user input (presumed to be a mixture of Markdown and HTML), convert it to straight HTML using something like Pandoc or markdown-js, and validate it.

It Would Have Been Very Nice If that proven-valid HTML markup could have been converted back to, and persisted as, Markdown; this would have given us several advantages when editing or presenting it. If you'll recall, I spend a week or two with Pandoc trying to develop reliable workflows and, partly due to churn in the underlying libraries, abandoned the effort for 0.5; that took us back to "HTML and only HTML for content". But that still didn't solve our validation problem, because the existing back-end code (PublishNewArticle and/or ProposeEditContribution) didn't "know" it really had to do that. That's what the entire second paragraph in the opening comment is about.

It's vitally important, especially considering that it's killed multiple earlier attempts at our app, to remember that the frond-end and back-end app are separate apps. Force-fitting the two has been a major, reliable cause of project failure, especially as PHP thinking spills over into software development.

All this issue is for is to remind us that we haven't solved that problem yet.

Clearer now?

mitpaladin commented 8 years ago

Right. Hmm. I would then argue that for the purposes of 0.5 we log this as a bug and move on, to be solved through Pandoc once the libraries are in a stable reliable state. Seems the most promising long-term solution?

jdickey commented 8 years ago

I agree with moving on and deferring that until after 0.5; we can tell people "if it blows up, it's because you pasted in malformed HTML, which we don't yet deal nicely with". I'd argue that Pandoc ought to be a fallback… we ought to be handling the editing/validation/transformation in the front-end code, so that all the back end sees is guaranteed-valid content, whether HTML or Markdown.

TheProlog / prolog-use_cases

Consistent valid-HTML policy for Publish New Article and Propose Edit Contribution? #52