Closed jdickey closed 8 years ago
As noted in this comment on Issue #1, we've reworked PrologCore
(now Prolog::Core
) to use Wisper, which significantly changes the second item in the "order of attack" list above.
Also, note that "summarise content" implies more than simply a list of articles.
Just as a reminder, the original PDF of the mocked-up landing page showed several items. Listed in roughly counter-clockwise order from top left, these are
So, eliminating the "don't care here" items, and reordering into a sensible implementation/mockup order, we're left with
One could easily argue that each of these six "ought" to be a separate use case/query, with the current item serving to aggregate the results of the individual queries.
I propose developing this app summary as originally envisioned, including each of these features in mockup/as-reported-from-queries form, and later (pre- or post-0.5 as time and priorities permit) decomposing the individual services.
Continuing from the enumeration-of-services comment above, we'd need to
Note that the Prolog::Core::Article
entity as of Gem release 0.2.0 does not have a "keywords" field; that is a defect addressed by TheProlog/prolog_core#9.
The prolog_core
Gem, including the Prolog::Core::Article
class, has now been bumped to v0.3.0, supporting article keywords; we're back in business.
Well, I wasted most of a day convincing myself that dynamic test-data generation had a couple of major disadvantages over fixtures, and generating an Article
list to use as one.
Separately, we've reacquanted ourselves with one of the main shortcomings, or at least, complications, of Wisper for many of our use cases: as the method names suggest, it's designed for unidirectional, "broadcast" or "published" communication. Faking RPCs with it is possible, but requires setting up two separate publsher/subscriber chains, which is tedious because it appears that the only way to make the response subscription work is to use (temporary, scoped) global listeners. Oh well.
Allow me to quote from the commit message for the just-pushed Commit d87e8ad:
Houston, we have data! (from a use case, even!)
... This is not likely to be the use case's final form (the name, "summarise content", should tip the reader off that there is likely more "content" than simply a list of articles), but it "works" as demonstrating the Wisper message traffic between the use case and an "authorisation port" that "knows" what the "current user" is, and the "
Article
persistence port", that can supply a list of allArticle
entities which that user may view. IMPORTANT NOTE: There is no protocol yet defined for the article-persistence port to actually do the filtering; in particular, there is no linkage between the persistence provider and any sort of authorisation that would do (or provide information necessary for) that filtering.We're again bumping up against the limitations of Wisper as a remote data provider in a query/ response fashion; with subscriptions limited to either individual object pairs or to an object listening for "global" broadcast messages (however scoped/filtered), this feels very fragile. Don't get us wrong; we love Wisper in its designed-for use case, which is basically the Observer pattern of unidirectional notification. It wobbles considerably more the farther outside that role it's asked to go.
How much more difficult will it be to make everything work correctly when the Wisper message conversation gets more complex and nuanced, say, when the Article
persistence provider (or a proxy for it) actually starts caring about filtering returned Article
s based on the current user's permissions? Would we be better off abandoning Wisper now in favour of some initially-more-complex system, or should we wait until we get stuck in a rabbit-hole that takes heavy equipment to burrow out of? Clocks are ticking…
Look carefully at these specs for SummariseContent#call
to see what I'm talking about here.
…and there's at least one way that "limitation" can force us to write better code.
As the comment above implies, we'd been expecting to do things like "filter articles available for viewing based on the current user" in the subscriber object; that is, our SummariseContent
or other use case would tell the persistence provider what the current user is and then ask for all Article
s which that user can view.
What if we turned that around a bit? What if we had at least two different persistence-provider implementations; one for Joe Public and another for an authenticated User
? This at least potentially simplifies the logic in each, while (in theory) providing a guarantee that an unauthenticated user can never see private/draft/etc posts. This doesn't solve any of the creakiness inherent in Wisper bidirectional conversations, but it does get rid of the need for any "linkage between the persistence provider and any sort of authorisation that would do (or provide information necessary for) that filtering" (of articles visible to a given user). Any user sees only public Articles and Articles which have been authored by that user; no further interaction with the authorisation works is necessary.
We've also, at this very early stage, also reached a point where a significant code stench needs to be addressed, that of the objects we've been calling "listeners" In the test code, but where the two existing listener objects demonstrate one end of a conversation. Should those be documented/enumerated separately from the test code they presently occur in and, if so, how? Writing separate, full-blown specs for them seems an exercise in navel-gazing, but we're really not comfortable that this behaviour/conversation is not separately enumerated and documented.
Peanut Gallery, feel free to chime in with suggestions.
Having two persistence providers for Guest vs. authenticated user definitely makes sense. Not sure I see the need to break out the paired listener objects. Wouldn't it just be a matter of noting which use case specs contain such objects?
Breaking them out is overkill. I was just thinking about how to keep the listeners, publishers, and messages involved straight.
I'm still uneasy with the ceremony involved setting up a Wisper conversation where each side uses different listener-setup APIs; one hooked to the publishing object, the other listening for global messages because it doesn't have access to its listener. That wouldn't happen in a proper message queue system.
So here's how the message traffic currently works.
Before calling SummariseContent#call
, the test code wires up an authentication listener whose #current_user
method responds to a broadcast message of that name, and replies by broadcasting a :current_user_is
message with the user name. That conversation happens as the first major step in SummariseContent#call
, when it calls its #query_current_user
method. Note that #query_current_user
sets up a temporary global listener (hereinafter TGL) to receive the response from :current_user
.
The second major step in SummariseContent#call
is a call to its #load_article_list
method which also sets up a TGL (as self
) which listens for the message broadcast from the persistence listener in response to :all_articles_permitted_to
(which will cause the #all_articles
method to be called).
So, in a nutshell, the #call
method receives the current user name from an authentication listener, and broadcasts it as part of a request to a persistence listener. SummariseContent
assumes that the persistence responder knows what articles are permitted to an individual user; making the authorisation responder a dependency of both the use case and the code setting up the responder. We'd really like to avoid that, and the best/only way to do that seems to be to eliminate that level of conditional logic from the persistence responder.
What we want is for the code wiring up the use case to its various message responders to itself make the current-user query from the authentication listener, and select a persistence responder that's appropriate for that user without user-related conditionals. So that would have the message traffic working something like this:
:current_user_is
message, as now;SummariseContent#call
.SummariseContent#call
broadcasts a :query_all_articles
message after setting itself up as a TGL.SummariseContent#all_articles
is called to respond to a message of that name, which supplies a list of Articles (as now).The use case no longer knows or cares what the current user is; it simply broadcasts a request for all articles in the system. The code setting up the responder to that broadcast is the only code outside the authorisation listener that knows what the current user is. This moves one existing bit of complexity out of the SummariseContent
use-case object, at the cost of slightly increasing complexity in the setup/wiring code. That increase is compensated for by the fact that the setup code is now the single recipient of truth with regard to the current user.
:query_all_articles
vs :all_articles
and :current_user
vs :current_user_is
are inconsistent. They're (obviously) being broadcast and received by different objects; would using identical names (e.g., :all_articles
, :current_user
) be simplifying or confusing? Do we even care now?prolog_core
release 0.3.0 and would require an 0.4.0 release. This is extremely straightforward but a detail to remember nonetheless.Either way on Known Concern No. 1 above works for me. I think it’s six of one and half a dozen of the other, but I’ve no problem keeping the names I’ve been using up to now. The listeners will obviously be reused and expanded upon further as we progress with this thing; I’d like to have a rewrite of SummariseContent
and an 0.4.0 of prolog_core
pushed by 24 hours from now; adding in the not-yet-implemented parts of the summary as enumerated in this comment (https://github.com/TheProlog/prolog-use_cases/issues/2#issuecomment-179631035) should take a small handful of hours after that.
Oh, yeah; by the time we've got 0.4.0 of prolog_core
, I'll have to regenerate the fixture data to include timestamps; then we'll be able to use them as is for a while (hopefully, the remainder of the project).
Now that we finally (as of Commit c474597) have Article
fixture data with creation and update timestamps and a relatively limited number of authors, we can use the fixture data for things that are actually interesting.
Consider this pry
session:
1] pry(main)> require 'prolog/core'
=> true
[2] pry(main)> articles = YAML.load_file('test/fixtures/articles.yaml');
[3] pry(main)> articles.first.class
=> Prolog::Core::Article
[4] pry(main)> author_names = Set.new(articles.map { |article| article.author_name })
=> #<Set: {"Rozella Oberbrunner",
"Katherine Heathcote",
"Juliana Ledner",
"Alexandro Rowe",
"Cathy Torphy",
"Nichole Quitzon",
"Tanya Ryan",
"Abigail Emmerich",
"Bobby Hackett",
"Lisette Hirthe",
"Favian Nader",
"Lesley Thiel",
"Dulce Mitchell",
"Alexandrea Nitzsche",
"Jasmin Parisian",
"Mina Pouros",
"Jamaal Shields",
"Kitty Runolfsson"}>
[5] pry(main)> authors.count
NameError: undefined local variable or method `authors' for main:Object
Did you mean? author_names
from (pry):5:in `__pry__'
[6] pry(main)> reset
Pry reset.
[1] pry(main)> require 'prolog/core'
=> true
[2] pry(main)> articles = YAML.load_file('test/fixtures/articles.yaml');
[3] pry(main)> articles.first.class
=> Prolog::Core::Article
[4] pry(main)> author_names = Set.new(articles.map { |article| article.author_name })
=> #<Set: {"Rozella Oberbrunner",
"Katherine Heathcote",
"Juliana Ledner",
"Alexandro Rowe",
"Cathy Torphy",
"Nichole Quitzon",
"Tanya Ryan",
"Abigail Emmerich",
"Bobby Hackett",
"Lisette Hirthe",
"Favian Nader",
"Lesley Thiel",
"Dulce Mitchell",
"Alexandrea Nitzsche",
"Jasmin Parisian",
"Mina Pouros",
"Jamaal Shields",
"Kitty Runolfsson"}>
[5] pry(main)> author_names.count
=> 18
[6] pry(main)> articles.count
=> 50
[7] pry(main)> mapping = {};
[8] pry(main)> author_names.map { |name| mapping[name.to_sym] = 0 };
[9] pry(main)> articles.each { |article| mapping[article.author_name.to_sym] += 1 };
[10] pry(main)> mapping
=> {:"Rozella Oberbrunner"=>7,
:"Katherine Heathcote"=>5,
:"Juliana Ledner"=>2,
:"Alexandro Rowe"=>2,
:"Cathy Torphy"=>3,
:"Nichole Quitzon"=>4,
:"Tanya Ryan"=>2,
:"Abigail Emmerich"=>2,
:"Bobby Hackett"=>5,
:"Lisette Hirthe"=>5,
:"Favian Nader"=>1,
:"Lesley Thiel"=>4,
:"Dulce Mitchell"=>2,
:"Alexandrea Nitzsche"=>2,
:"Jasmin Parisian"=>1,
:"Mina Pouros"=>1,
:"Jamaal Shields"=>1,
:"Kitty Runolfsson"=>1}
[11] pry(main)> mapping.sort { |a1, a2| a1.last <=> a2.last }.reverse
=> [[:"Rozella Oberbrunner", 7],
[:"Bobby Hackett", 5],
[:"Katherine Heathcote", 5],
[:"Lisette Hirthe", 5],
[:"Nichole Quitzon", 4],
[:"Lesley Thiel", 4],
[:"Cathy Torphy", 3],
[:"Dulce Mitchell", 2],
[:"Juliana Ledner", 2],
[:"Abigail Emmerich", 2],
[:"Tanya Ryan", 2],
[:"Alexandro Rowe", 2],
[:"Alexandrea Nitzsche", 2],
[:"Kitty Runolfsson", 1],
[:"Jasmin Parisian", 1],
[:"Mina Pouros", 1],
[:"Jamaal Shields", 1],
[:"Favian Nader", 1]]
[12] pry(main)>
We can then do things like
[12] pry(main)> articles.select { |article| article.author_name == 'Rozella Oberbrunner' }.
[12] pry(main)* sort { |a1, a2| a1.updated_at <=> a2.updated_at }.
[12] pry(main)* reverse.each { |article| puts "#{article.title} -- Last updated at #{article.updated_at}" };
American Diaries 2: Son of American Diaries -- Last updated at 2016-02-04T13:55:51+08:00
Red Monster -- Last updated at 2016-01-24T16:49:53+08:00
The Red Rose of Northern Ireland -- Last updated at 2016-01-15T13:42:23+08:00
The Ultra Witch Who Fell to Earth -- Last updated at 2016-01-15T12:09:58+08:00
Season of the Dreams -- Last updated at 2016-01-09T17:29:29+08:00
Christmas on Yazmin Mount -- Last updated at 2016-01-09T13:59:26+08:00
Journey of the Action Friday -- Last updated at 2015-12-20T11:41:04+08:00
[13] pry(main)>
And since this is fixture data, the results should work exactly the same in your pry
session as they do in the one transcripted above. Completing the SummariseContent
class from here should be quite straightforward.
Right. What's the difference between "most recent articles" and "most recently-updated articles"? We'd argue that the difference is that the latter is guaranteed not to include articles that have not been updated since originally published. That might not be immediately obvious to someone taking a quick read through this comment from 4 February which attempts to lay everything out.
We just felt the need to restate that for the record, as we'd just done a quick implementation of a "most recent articles" list that was really a "most *recently-updated articles" list. Almost identical logic, so it's not like we throw anything away… :grinning:
Considering the list from the comment linked immediately above,
Article
s by the updated_at
timestamp without filtering out those Article
s without updates (and whose created_at
and updated_at
timestamps are therefore identical)Article
s by created_at
timestamp, in reverse order (most recent first). Trivial; up next as we write this;SummariseContent#keywords_by_frequency
method will read the keywords assigned to each article, maintaining a list of keywords with counts (independent of what Article
s those keywords are associated with), which will then be sorted and returned.These tasks need data beyond what is provided by the prolog_core
Gem as of release 0.4. We don't think that scoring/ranking should be a part of prolog_core
, but a separate component/Gem that depends on prolog_core
. We're generally willing to be persuaded otherwise, however.
Article
s by their CoRank or equivalent. Further definition of this is required, and may necessitate having this data item deferred until after Meldd 0.5 launches.Article
s by decreasing CoRank score. While this is clear in concept, CoRank itself may not be fully designed and implemented in time for Meldd 0.5 and thus is also a candidate for deferral.Article
s in descending order of increase in their CoRank over a fixed period of time. As with the two earlier features, this is unlikely to make it into Meldd 0.5.Questions? Objections? Discussion and PRs welcome. :grinning:
:P I would argue we don't need tag cloud sorting for 0.5 either, just provide "Most recently-updated articles" as default and "Most recent articles" to show how different sorting could be done
As of Commit 81c5baa, we now have all the pieces we're going to have in the current whack at SummariseContent
, but there's an inconsistency: we return a list of articles from #call
, and there are separate methods for keywords by frequency, most recent articles, and most recently-updated articles. Either we should rename #call
to #list_articles
, or rework #call
to return a Hash of all four lists. We'd argue for the latter, especially considering that what's currently done in #call
is a prerequisite for any of the other lists.
Coming up shortly; the big time sink today was automating the rebuild-article-fixture-data rebuild process.
As presented in Feature Strategy and Planning (Issue #1), our first use case, from the user perspective, is
Most of that is presentational detail, except for the phrase "existing articles".
What if we don't have any existing articles yet?
We can rephrase that for our present purposes as
This is facilitated by our earlier decision to inject a repository as the persistence "port" of the existing entities and, by extension, most other data-encumbered artefacts we'll deal with. There is a class which defines the interface used to perform commands on or queries of that repository, however, implementing classes need not (and likely should not) subclass
PrologCore::Ports::Repository
but rather use duck typing. (We may want to revisit that decision, but not now.)So, the order of attack seems to be:
Article
data to use for the summary presentation for test purposes. (This implies some source ofUser
data for authors as well, but that's a don't-care at this level);We've never yet considered or implemented a failure-result case for an "index Articles" action; unless someone can present a compelling case why that should change, we won't do so here, either.
It Would Be Very Nice If™ we, separately, spun up a lightweight app to exercise this use-case API.