Closed jdickey closed 8 years ago
Following the basic style of the existing use cases' API, we'd expect those parameters to be broken down between #initialize
and #call
, where
#initialize
takes parameters for :article_repository
, :contribution_repository
, and :authoriser
; and#call
takes parameters for the :article_identifier
, :endpoints
, :proposed_content
, and :justification
.We still think this is creaky, and are very open to suggestions for improvement.
You're right on the race conditions for locking content blocks for in-progress (user entering info) and pending (not-yet-decided by Author) contributions.
I think resolving this (post-0.5) will require the validation check for begin/end tags within the content selection, where:
note that this would imply each Contribution-ID assigned may or may not end up as a fully input Contribution (ex. Contribution 1 on paragraph is initiated but canceled, Contribution 2 on paragraph 2 is initiated before Contribution 1 cancellation and is submitted)
what do you think?
I think this is the place to check for overlap. We know that the identified-anchor tag pair is 32 characters long plus the length of the numeric ID (so, e.g., <a id=\"contribution-176-begin></a>
is 35 characters long, and its matching contribution-176-end
tag pair is 33 characters long), so if we inspect the (original) article body from n characters before the range to m characters after (where n is 32 plus the length of the number as a string and m is the corresponding offset after the end), that's accomplished. That won't fix our race condition, but it'll change where in the new code it occurs; possibly akin to reshuffling the deck chairs on board Titanic.
I put creating the Contribution repository entity at the end for a reason; it doesn't get done if any of the preceding steps report an error. Keep things as atomic and single-path as possible.
tl;dr we need a thread-safe locking mechanism, and nothing we've proposed so far is going to accomplish that. We have, however, agreed on the need and (hopefully) the means of validating that no overlapping active contributions are involved.
(Though to check the status of any discovered contribution identified-anchor tag pairs, that will require queries against the Contribution repository — which, just to remind everyone again, is not and is not tied to a particular database implementation at this level.
Contribution IDs are unique within Articles and have no significance other than to identify entries in the Contribution repository. Conflating and confusing multiple concerns and levels is the root cause for our four years of repeated, consistent failure.
D-Day minus 29 days 1 hour as I type this.
Ok, so agreed on the Contribution IDs Just wanted to highlight that each ID does NOT imply a valid Contribution
Not a valid Contribution, no; simply the creation of a proposed Contribution.
yep
Note that the ordering of steps in the initial comment has been adjusted.
The time-consuming opportunities for races are at literally every step:
Basically, what we've done is spec ourselves into a corner where we have to throw top-quality hosting at this in production (even concurrent-user demonstrations) that are highly unlikely to be satisfied with a single Heroku dyno or DO droplet. :disappointed:
how about having the Contribution Repository check and increment the Article modified timestamp at each query? lock the Article if the request is coming from an up-to-date copy of the Article and if the Article is not already locked, error otherwise
Because there's no incoming data to validate the last-modified timestamp against, and that won't be modified immediately after a new edit-contribution-proposal use case is started. We're stuck in SQL thinking, and we need to think outside that box. If the first thing we do after finding out who we are by asking the authoriser (or even right after we get the Article entity) is to tell the data store underlying the Article repository to lock the relevant record, that still doesn't happen instantaneously — especially on a single dyno. And that's not even considering the Contribution Repository, which is the source of our unique-per-Article Contribution ID.
We can pull off a demo with this, and either wave off bad data or write a cleanup tool somehow (later), but we still have a capital-P Problem. The least bad solution that occurs to me right off is to do the following:
Something else that needs to be kept in mind thinking through all this is that there is currently no support in the Article
entity for locking; that would require an update to Prolog::Core::Article
and the release of a new prolog_core
Gem.
Would the aforementioned yak shave deliver sufficient value to justify the additional time and effort?
hmm, I say we get the use cases working without worrying about conflicts for now, we will probably come up with a better way to do thread-safe locking after 0.5
Necessity is the mother of invention, and desperation its crazy aunt that never stays locked up in the basement. Agreed.
Added content to the opening comment:
And One More Thing…
Remember that edit-contribution proposals by the author of an Article are always immediately accepted; we need to think up the least hackish yet sufficiently rapid-to-implement way to accomplish this.
Repeated here as a reminder (and also to trigger a Slack bot notification).
for 0.5 lets just have the Author contributions use the same flow as anyone else's contributions later on we can add a background process that automatically Accept on behalf of the Author
No need to background it; when we persist the new Contribution in the last step, we persist its status as accepted
for the author or proposed
for anyone else; for the Author (only) we persist an Article with the updated body to the Article repository which we already have access to. I don't even want to think about background anything yet; that's well out of use-case scope.
The other thing that's interesting about the form object for ProposeEditContribution
is that the conversion of (Markdown) proposed content to HTML is transparent to the caller of the use case (or, indeed, to the use-case class itself). This is done through Virtus' feature for overriding setters. Handy.
Validate Selection should be a separate use case from Propose Edit Contribution (hereinafter PEC or the Use Case):
This removes a bit of the validation logic heretofore assumed to be bound up in PEC; specifically, the need to validate the selected range (though ensuring that the current user is authorised is simple belt-and-suspenders security). Additionally, rather than injecting an Article Repository as a parameter, the Use Case is passed an Article entity to be associated with the Proposal.
The Use Case still needs
Error conditions that may arise include:
is bad
from This is bad content
should select the words to either side of the proposed deletion as well, and specify the replacement content This content
.) That's it, as far as we know now. Isn't it?
Right, no locking of content for pending contributions and no checking for overlap with other pending contributions in 0.5.
That wasn't what I meant, really; I meant that that's done separately, that's all. That's what I've been stumbling over for a while; by the time you get to the actual persist-the-proposal step, you've already validated the proposing Member and the selected range of content, so this UC doesn't need to do that again.
you're right, at this particular step those are the only two errors I can think of
We summarise the use case at the start of propose_edit_contribution.rb
as follows (somewhat reformatted):
"Propose Edit Contribution" use case.
Basic scenario:
A Member is viewing an Article on an Article Page. After selecting a range of body content and verifying that the selected range does not overlap Locked Content and that the Member is permitted to Propose Contributions to the Article, (see the "Validate Selection" use case), this use case simply persists the Contribution Proposal into the Contribution Repository based on the supplied inputs, and reports success.
On success:
- If the current Member is the Article's Author, then a new Accepted Contribution will be persisted to the Contribution Repository;
- If the current Member is not the Article's Author, then a new Proposed Contribution will be persisted to the Contribution Repository;
- a UI :success notification will be emitted, with additional data including the name of the currnet Member and the title of the affected Article;
Failure modes:
- The Member may no longer be logged in, generally due to a timeout or other external condition. The use case reports a
'no authorised member'
error and aborts the use case;- The proposed replacement content may be empty, blank, or omitted. This will cause a
'no content'
error to be reported, and the use case will abort.
A couple of other interesting things happened with this latest commit:
Prolog::Services::ContentSelection::FormObject
and Prolog::UseCases::ProposeEditContribution::FormObject
both use an IntegerRange
class for coercing an :endpoints
attribute; this has been factored out as the Prolog::Support::FormObject::IntegerRange
class.IntegerRange
class has been moved to a nested module, Prolog::Support::FormObject
which did not previously exist. As it's presently used by both a use case and a supporting service, should either of those be broken out into a separate Gem per our "Getting Back On Track" section in the Wiki article Architectural Rabbit-Hole Avoidance Attempt No. 769), I expect that the Prolog::Support::FormObject
module will itself need to be broken out into yet another Gem. Remember when modularisation was supposed to simplify our lives? It does, but that's not always apparent when you're sitting at the keyboard.We think we're back on a more reasonable path now. That clock ticking is going to generate USGS earthquake alerts before too much longer…
Ah, but we say we're going to report errors to a UI interface object that should be injected into the use case (presumably in its #initialize
method); that hasn't happened yet. Coming right up. :flushed:
And Prolog::UseCases::ProposeEditContribution::FormObject
now has seven attributes, four of which are actual field values entered by the user. SandiMeter: smashed.
Here's a corner case for consideration:
last
in the element <li>This is the last item in the list.</li>
;very last
. This gets passed to the Markdown-to-HTML converter, which renders it as <p>very last</p>
;<li>This is the <p>very last</p> item in the list.</li>
.This (almost certainly) isn't what the Member intended, and there's no real way for him to correct it except to replicate the entire (updated) content of the containing list as his Proposed Contribution.
This is an interesting situation.
For that matter, any replacement text that's entered as one or more simple strings, without Markdown list indications or other decoration, will be rendered as one or more paragraphs replacing the selected content. The first several potential "fixes" we considered for this are neither simple nor particularly reliable. For example, replacing a selection within a single element with the text of a proposed "paragraph" when that original element is not a complete paragraph might cover the majority of cases in the wild.
What happens if the Member wants to break the paragraph, and thus enters content that gets parsed as two paragraphs?
Mr Smith.
Note that Ms Jones and Mr Abdui-Rahman *are not* included in this scenario.
3.
The Member "obviously" (to a human reading this) intends for a paragraph break to be inserted after Mr Smith
, and for the original content of the (then-single) paragraph to continue after in this scenario.
How does the code figure that out, without introducing even more ambiguity and opportunity for error?
Maybe a solution would include rules such as:
This still leaves plenty of continent-sized holes and unresolved corner cases, but it might be an acceptable workaround — that could actually be implemented.
Say we have a part of the content that is:
<p>one paragraph</p>
<p>another paragraph</p>
if the user selection is from the beginning to the end of another
, output from the Markdown-to-HTML converter
be:
<p>one paragraph</p>
<p>another</p>
Is that right?
No, you're looking at things precisely backwards.
Markdown is used to author rich content (article bodies, user profiles, etc), because it's easier for most people to work with than HTML, yet it's (relatively) easily convertible to HTML.
Let's say that an Author creates a new Article, and in that Article's body text, includes the content
This is one paragraph.
This is another paragraph.
Other content goes on...
This Markdown is (or should be) converted to HTML as part of the publish-article use case, and persisted as HTML. That's important to keep in mind; everything that deals with persisted content, including Contribution selections, deals with HTML.
So the content above would be persisted to storage as
<p>This is one paragraph.</p><p>This is another paragraph.</p><p>Other content goes on...</p>
all in a single "line" of HTML without newlines or intervening white space. (But see TheProlog/prolog-services-markdown_to_html
Issue 1).
Someone later comes along and selects the word "another" in the second paragraph above as the target of an Edit Contribution, and enters the replacement content as "one more", with no additional newlines or other Markdown (or HTML) formatting. This gets sent to the Markdown-to-HTML converter, which returns the "equivalent" HTML as
<p>one more</p>
The propose-edit-contribution use case decides that everything is just fine, adds the begin- and end-proposed-contribution anchor tags to the content, which then becomes
<p>This is one paragraph.</p><p>This is <a id="contribution-begin-247"></a><p>one more</p><a id="contribution-end-247"></a> paragraph.</p><p>Other content goes on...</p>
and we now have invalid HTML, as a paragraph cannot be embedded within another paragraph. Even were this not a problem, this clearly (to us) isn't what the Member had in mind, which would look more like
<p>This is one paragraph.</p><p>This is <a id="contribution-begin-247"></a>one more<a id="contribution-end-247"></a> paragraph.</p><p>Other content goes on...</p>
Is the problem clearer now?
This would all be much simpler if we could restrict individual Contribution Proposals to one block-level element, such as a paragraph or list item, within the content; not being willing to accept that restriction is the reason why we didn't ship something 2-3 years ago, in the meldd1
-to-meldd-beta
timeframe.
If we're going to say "0.5 implements Rules 1-3 from [my] earlier list", then I'm going to insist on Rule 6 as well. Otherwise, I guarantee that the first or second time someone opens up our app, they'll break the content of an Article in a way that can't be repaired within the UI, and they'll say "this is broken; we're not paying for it".
agreed, better disable contributions with endpoints in any other elements in addition to list items that we won't be supporting in 0.5
That's implementable. Does that cover the bases well enough for 0.5?
For the record, you do realise, of course, that if we'd had this restriction in place way back when, we could have shipped meldd1
or meldd-beta
as 0.5, have been paid for the last couple of years, relocated out of Singapore, and be working on about a 3.0 release now?
Rule 6 was "Any selection of content where one endpoint is in a paragraph or other similar block element, and another is in a list item, will be rejected."
Rules 1-3 enable cross paragraph Contributions where end points are not in list items or similar elements. Isn't that right?
Getting contributions working across paragraphs is the barrier to break down.
You're right, Rule 6 shouldn't have been phrased so specifically, but rather as
Any selection of content where one endpoint is in one block element such as a paragraph or list item within an ordered/unordered list, and the other endpoint is in a different block element than the first, will be rejected.
It's always been and it likely always will be. The most viable cross-block strategy I've been able to think of yet is to introduce the concept of a "group contribution", where each partial or complete block within the selection is treated as a separate Contribution from a content POV (including separate Contribution IDs), but is presented as a single Contribution to be accepted or rejected by the Author (with yet its own unique Contribution ID). See how that works?
The hard part of doing it that way is to ensure that the resulting markup (with the Contribution Proposal's replacement content applied in place of all of the individual contributions within the Group Contribution) remains valid markup. But we can deal with that, given a bit of time.
we need to honour the guarantee that all persisted content is or is renderable as valid, well-formed XHTML5.
right, text selection has to be broken up into the different blocks before going through parser and being persisted
and since Rule 6 original or rephrased is or will be taken care of by the Validate Selection use case, which is a prerequisite to Propose Edit Contribution, it's a don't-care here. We can assume that the selection that is passed into us honours that rule because, if it didn't, it would never have gotten this far. Right?
another of the things I need to come up with is a "roadmap" document as other projects often have, showing which use cases we expect to implement and in which order. For example, Validate Selection didn't exist as a separate use case concept until yesterday, but it's fairly obviously going to need to be the next thing worked on after this.
Rule 6 original. As group contribution will enable Rules 2 and 3
Fine; that's a don't-care-for-now. Anything else we need to thrash out that we haven't already?
The use-case definition has been extracted to a Wiki page simply because it gets too confusing to keep adding multiple versions of it to this issue.
We think we have the complete description as of Meldd 0.5 there presently. There are still numerous paths through the logic, which is troublesome.
No Member logged in is the first failure mode documented for the class. The Validate Selection use case would need to perform that check as well, which tends to argue for it being redundant here.
Instead of having an "Authoriser instance" as an input, should we inject just the Member name? Or leave the input list as is, but simply forego the check-for-guest-user failure check? Let's go with the latter for now.
So how should we reify an Edit Contribution? We've two obvious concerns:
proposed
, accepted
, or rejected
, where an entity with the first status can be replaced by one of the latter two, but the latter two can't (you can't have a rejected
Contribution be accepted
after the fact, and you can't mark it as proposed
, either). Hey, that's domain logic, which really shouldn't be in the entity-level objects anyway, right?Different purposes, different classes.
A Contribution::Edit::Accepted
instance indicates that a Proposed Contribution (details of which are encapsulated within) has been Accepted. Likewise, a Contribution::Edit::Rejected
instance indicates that a Proposed Contribution has been Rejected. Each of these two classes has a #responded_at
timestamp and a #response
content string, as well as indicator methods such as #responded?
, #accepted?
, and #rejected?
that return appropriate (immutable) values.
A Contribution::Edit::Proposed
instance contains the details of the Proposal itself: Article identification, proposing Member, endpoints, Proposed Content, Justification content, and timestamp that it was instantiated at (or that was previously persisted along with the rest of the field data). It has a #responded?
method that always returns false
, and does not respond to either #accepted?
or #rejected?
. As the Single Source of Truth about the details of the initiating Proposal, it is encapsulated by (and forwarded to) by the two response classes. This is likely a cleaner solution than the traditional superclass-grouping-common-features approach.
In the Beginning, was the Proposal.
Following the above comment, the existing Prolog::Entities::EditContribution
class will be re-tasked as Prolog::Entities::Contribution::Proposed
.
Another thought pops into mind: why should the endpoints of the selection be persisted in the Contribution Proposal entity? The persisted Article content is to have content (identified-anchor tag pairs) which should be the Single Source of Truth for the endpoints' location once the Article is persisted. If that is the case, then the endpoints as offsets become redundant.
Endpoints as offsets wouldn't offer any value after future contributions change offsets right? But perhaps worth retaining for now in Contribution Proposal for error checking reference?
The things to notice about the growing form object and entity creation/persistence added in the last two commits (3f9e4ae and ea7fd8d) are:
Prolog::Core
is pushed out to an implementation detail. The final implementation will (almost certainly) make use of Prolog::Core::Article
(and Prolog::Core::User
), but those details are incidental; this code uses duck typing that "just happens" to be compatible with what has been defined there (probably in bass-ackwards fashion, as it turns out, but oh well).Prolog::Entities::ArticleIdent
in a way that's probably not worth changing; certainly not right now.ContributionRepository#create
method that creates a Contribution entity. As will all field values passed to that method, however, it's no concern of the code in this use case whether or how any specific field is actually used by that #create
method and the entity which it creates. We don't need Rails' "strong parameters"; we use Virtus. Because the repository really does know more about what should be in the entity than the controller does. :stuck_out_tongue_winking_eye:
Overview
Class
ProposeEditContribution
is the next in our continuing series of use cases, which allows for the submission of a request that the author of an Article approve a change to its content.Its inputs are (in no particular order or separation here):
That's a total of _s_even inputs, which just feels like a lot. They're all needed, as far as we can tell, because this "use case" is a high-level "supervisor" for several actions:
Problems
Are there problems with this? Of course! The obvious one is a race condition; in high-use environments for popular articles, there's a virtual certainty that two proposals can attempt to select the same (or overlapping) content and be assigned the same contribution ID. We could add a locked-by/unlocked state to the Article and immediately persist a locked-by-proposer entity to the Article Repository on entry, and an unlocked one on completion, but that simply adds two persistence calls to the workflow, which are prime opportunities for race conditions themselves. At some point well before 1.0, we're going to have to rethink this.
And One More Thing…
Remember that edit-contribution proposals by the author of an Article are always immediately accepted; we need to think up the least hackish yet sufficiently rapid-to-implement way to accomplish this.