Generate and return app-summary information

TheProlog / prolog-use_cases

Use-case layer for Meldd/Prolog application.

0 stars 0 forks source link

Generate and return app-summary information #2

Closed jdickey closed 8 years ago

jdickey commented 8 years ago

As presented in Feature Strategy and Planning (Issue #1), our first use case, from the user perspective, is

View the landing page, with existing articles, styling, and navigation links;

Most of that is presentational detail, except for the phrase "existing articles".

What if we don't have any existing articles yet?

We can rephrase that for our present purposes as

Aggregate and return a collection of all data to be presented to the user in a "landing page" or other similar application-summary representation. The only such content yet identified is a list of Article entities. Since no actual user-generated Articles yet exist, use dummy data in a way that the implementation need not know or care whether it's working with test or live data.

This is facilitated by our earlier decision to inject a repository as the persistence "port" of the existing entities and, by extension, most other data-encumbered artefacts we'll deal with. There is a class which defines the interface used to perform commands on or queries of that repository, however, implementing classes need not (and likely should not) subclass PrologCore::Ports::Repository but rather use duck typing. (We may want to revisit that decision, but not now.)

So, the order of attack seems to be:

Build the skeleton of the use-case API;
Build a repository of Article data to use for the summary presentation for test purposes. (This implies some source of User data for authors as well, but that's a don't-care at this level);
Wire up the successful-result case of the use-case logic within the skeleton.

We've never yet considered or implemented a failure-result case for an "index Articles" action; unless someone can present a compelling case why that should change, we won't do so here, either.

It Would Be Very Nice If™ we, separately, spun up a lightweight app to exercise this use-case API.

jdickey commented 8 years ago

As noted in this comment on Issue #1, we've reworked PrologCore (now Prolog::Core) to use Wisper, which significantly changes the second item in the "order of attack" list above.

Also, note that "summarise content" implies more than simply a list of articles.

jdickey commented 8 years ago

Just as a reminder, the original PDF of the mocked-up landing page showed several items. Listed in roughly counter-clockwise order from top left, these are

Authentication credential entry — don't care here;
"Hot topics" tag cloud;
"Top articles" (by view? by ranking?);
Most recent articles;
Top authors (same question as with 'top articles'; why not "most prolific" as well?)
"Rising Stars" (presumably authors with fastest-increasing CoRank over time);
A select box, illustrated as selecting "Last updated article", which (presumably) selects the group of articles summarised in the grid below it;
An "Apply for Membership" button — don't care here; and
A top-level menu ("Home", "Membership", etc) — don't care here.

So, eliminating the "don't care here" items, and reordering into a sensible implementation/mockup order, we're left with

Most recent articles;
Most-recently-updated articles;
Top articles;
Top authors;
"Rising Stars";
"Hot topics" tag cloud.

One could easily argue that each of these six "ought" to be a separate use case/query, with the current item serving to aggregate the results of the individual queries.

I propose developing this app summary as originally envisioned, including each of these features in mockup/as-reported-from-queries form, and later (pre- or post-0.5 as time and priorities permit) decomposing the individual services.

jdickey commented 8 years ago

Continuing from the enumeration-of-services comment above, we'd need to

Build a list of dummy authors to use for dummy articles;
Build a list of dummy articles, using randomly-selected authors (from the above), creation timestamps (within a given window), time-updated timestamps (within a later window, with a percentage left without updates), topic keywords, and reputation scores (TBD);
Build lists ordering the dummy articles by
1. most recently created;
2. most recently updated, where articles without updates default to the creation timestamp; and
3. reputation score;
Analyse the list of articles, using each article's author, creation timestamp, and reputation score to generate a list of authors ordered by their articles' reputation score over time to produce the "Rising Stars" list; and
Finally, enumerate the keywords associated with each article to produce the "hot topics" keyword hash.

Note that the Prolog::Core::Article entity as of Gem release 0.2.0 does not have a "keywords" field; that is a defect addressed by TheProlog/prolog_core#9.

jdickey commented 8 years ago

The prolog_core Gem, including the Prolog::Core::Article class, has now been bumped to v0.3.0, supporting article keywords; we're back in business.

jdickey commented 8 years ago

Well, I wasted most of a day convincing myself that dynamic test-data generation had a couple of major disadvantages over fixtures, and generating an Article list to use as one.

Separately, we've reacquanted ourselves with one of the main shortcomings, or at least, complications, of Wisper for many of our use cases: as the method names suggest, it's designed for unidirectional, "broadcast" or "published" communication. Faking RPCs with it is possible, but requires setting up two separate publsher/subscriber chains, which is tedious because it appears that the only way to make the response subscription work is to use (temporary, scoped) global listeners. Oh well.

jdickey commented 8 years ago

Allow me to quote from the commit message for the just-pushed Commit d87e8ad:

Houston, we have data! (from a use case, even!)

... This is not likely to be the use case's final form (the name, "summarise content", should tip the reader off that there is likely more "content" than simply a list of articles), but it "works" as demonstrating the Wisper message traffic between the use case and an "authorisation port" that "knows" what the "current user" is, and the "Article persistence port", that can supply a list of all Article entities which that user may view. IMPORTANT NOTE: There is no protocol yet defined for the article-persistence port to actually do the filtering; in particular, there is no linkage between the persistence provider and any sort of authorisation that would do (or provide information necessary for) that filtering.

We're again bumping up against the limitations of Wisper as a remote data provider in a query/ response fashion; with subscriptions limited to either individual object pairs or to an object listening for "global" broadcast messages (however scoped/filtered), this feels very fragile. Don't get us wrong; we love Wisper in its designed-for use case, which is basically the Observer pattern of unidirectional notification. It wobbles considerably more the farther outside that role it's asked to go.

How much more difficult will it be to make everything work correctly when the Wisper message conversation gets more complex and nuanced, say, when the Article persistence provider (or a proxy for it) actually starts caring about filtering returned Articles based on the current user's permissions? Would we be better off abandoning Wisper now in favour of some initially-more-complex system, or should we wait until we get stuck in a rabbit-hole that takes heavy equipment to burrow out of? Clocks are ticking…

Look carefully at these specs for SummariseContent#call to see what I'm talking about here.

jdickey commented 8 years ago

…and there's at least one way that "limitation" can force us to write better code.

As the comment above implies, we'd been expecting to do things like "filter articles available for viewing based on the current user" in the subscriber object; that is, our SummariseContent or other use case would tell the persistence provider what the current user is and then ask for all Articles which that user can view.

What if we turned that around a bit? What if we had at least two different persistence-provider implementations; one for Joe Public and another for an authenticated User? This at least potentially simplifies the logic in each, while (in theory) providing a guarantee that an unauthenticated user can never see private/draft/etc posts. This doesn't solve any of the creakiness inherent in Wisper bidirectional conversations, but it does get rid of the need for any "linkage between the persistence provider and any sort of authorisation that would do (or provide information necessary for) that filtering" (of articles visible to a given user). Any user sees only public Articles and Articles which have been authored by that user; no further interaction with the authorisation works is necessary.

We've also, at this very early stage, also reached a point where a significant code stench needs to be addressed, that of the objects we've been calling "listeners" In the test code, but where the two existing listener objects demonstrate one end of a conversation. Should those be documented/enumerated separately from the test code they presently occur in and, if so, how? Writing separate, full-blown specs for them seems an exercise in navel-gazing, but we're really not comfortable that this behaviour/conversation is not separately enumerated and documented.

Peanut Gallery, feel free to chime in with suggestions.

mitpaladin commented 8 years ago

Having two persistence providers for Guest vs. authenticated user definitely makes sense. Not sure I see the need to break out the paired listener objects. Wouldn't it just be a matter of noting which use case specs contain such objects?

jdickey commented 8 years ago

Breaking them out is overkill. I was just thinking about how to keep the listeners, publishers, and messages involved straight.

I'm still uneasy with the ceremony involved setting up a Wisper conversation where each side uses different listener-setup APIs; one hooked to the publishing object, the other listening for global messages because it doesn't have access to its listener. That wouldn't happen in a proper message queue system.

jdickey commented 8 years ago

So here's how the message traffic currently works.

Before calling SummariseContent#call, the test code wires up an authentication listener whose #current_user method responds to a broadcast message of that name, and replies by broadcasting a :current_user_is message with the user name. That conversation happens as the first major step in SummariseContent#call, when it calls its #query_current_user method. Note that #query_current_user sets up a temporary global listener (hereinafter TGL) to receive the response from :current_user.

The second major step in SummariseContent#call is a call to its #load_article_list method which also sets up a TGL (as self) which listens for the message broadcast from the persistence listener in response to :all_articles_permitted_to (which will cause the #all_articles method to be called).

So, in a nutshell, the #call method receives the current user name from an authentication listener, and broadcasts it as part of a request to a persistence listener. SummariseContent assumes that the persistence responder knows what articles are permitted to an individual user; making the authorisation responder a dependency of both the use case and the code setting up the responder. We'd really like to avoid that, and the best/only way to do that seems to be to eliminate that level of conditional logic from the persistence responder.

What we want is for the code wiring up the use case to its various message responders to itself make the current-user query from the authentication listener, and select a persistence responder that's appropriate for that user without user-related conditionals. So that would have the message traffic working something like this:

The code setting up the use case (here, the test code), after wiring up an authorisation listener, asks it what the current user is.
The authorisation listener replies by broadcasting a :current_user_is message, as now;
The setup code then instantiates and connects a persistence responder appropriate for the user (think of Guest vs Member) and (eventually) calls SummariseContent#call.
SummariseContent#call broadcasts a :query_all_articles message after setting itself up as a TGL.
SummariseContent#all_articles is called to respond to a message of that name, which supplies a list of Articles (as now).

The use case no longer knows or cares what the current user is; it simply broadcasts a request for all articles in the system. The code setting up the responder to that broadcast is the only code outside the authorisation listener that knows what the current user is. This moves one existing bit of complexity out of the SummariseContent use-case object, at the cost of slightly increasing complexity in the setup/wiring code. That increase is compensated for by the fact that the setup code is now the single recipient of truth with regard to the current user.

Known concerns

The message names :query_all_articles vs :all_articles and :current_user vs :current_user_is are inconsistent. They're (obviously) being broadcast and received by different objects; would using identical names (e.g., :all_articles, :current_user) be simplifying or confusing? Do we even care now?
The persistence provider, or perhaps an intermediate proxy, would be responsible for ordering as well as filtering articles. Filtering is obvious enough, but the intended options for ordering (based on original-publication or most-recent-update timestamps) are not yet supported by prolog_core release 0.3.0 and would require an 0.4.0 release. This is extremely straightforward but a detail to remember nonetheless.
Anything else?

jdickey commented 8 years ago

Either way on Known Concern No. 1 above works for me. I think it’s six of one and half a dozen of the other, but I’ve no problem keeping the names I’ve been using up to now. The listeners will obviously be reused and expanded upon further as we progress with this thing; I’d like to have a rewrite of SummariseContent and an 0.4.0 of prolog_core pushed by 24 hours from now; adding in the not-yet-implemented parts of the summary as enumerated in this comment (https://github.com/TheProlog/prolog-use_cases/issues/2#issuecomment-179631035) should take a small handful of hours after that.

Oh, yeah; by the time we've got 0.4.0 of prolog_core, I'll have to regenerate the fixture data to include timestamps; then we'll be able to use them as is for a while (hopefully, the remainder of the project).

jdickey commented 8 years ago

Now that we finally (as of Commit c474597) have Article fixture data with creation and update timestamps and a relatively limited number of authors, we can use the fixture data for things that are actually interesting.

Consider this pry session:

1] pry(main)> require 'prolog/core'
=> true
[2] pry(main)> articles = YAML.load_file('test/fixtures/articles.yaml');
[3] pry(main)> articles.first.class
=> Prolog::Core::Article
[4] pry(main)> author_names = Set.new(articles.map { |article| article.author_name })
=> #<Set: {"Rozella Oberbrunner",
 "Katherine Heathcote",
 "Juliana Ledner",
 "Alexandro Rowe",
 "Cathy Torphy",
 "Nichole Quitzon",
 "Tanya Ryan",
 "Abigail Emmerich",
 "Bobby Hackett",
 "Lisette Hirthe",
 "Favian Nader",
 "Lesley Thiel",
 "Dulce Mitchell",
 "Alexandrea Nitzsche",
 "Jasmin Parisian",
 "Mina Pouros",
 "Jamaal Shields",
 "Kitty Runolfsson"}>
[5] pry(main)> authors.count
NameError: undefined local variable or method `authors' for main:Object
Did you mean?  author_names
from (pry):5:in `__pry__'
[6] pry(main)> reset
Pry reset.
[1] pry(main)> require 'prolog/core'
=> true
[2] pry(main)> articles = YAML.load_file('test/fixtures/articles.yaml');
[3] pry(main)> articles.first.class
=> Prolog::Core::Article
[4] pry(main)> author_names = Set.new(articles.map { |article| article.author_name })
=> #<Set: {"Rozella Oberbrunner",
 "Katherine Heathcote",
 "Juliana Ledner",
 "Alexandro Rowe",
 "Cathy Torphy",
 "Nichole Quitzon",
 "Tanya Ryan",
 "Abigail Emmerich",
 "Bobby Hackett",
 "Lisette Hirthe",
 "Favian Nader",
 "Lesley Thiel",
 "Dulce Mitchell",
 "Alexandrea Nitzsche",
 "Jasmin Parisian",
 "Mina Pouros",
 "Jamaal Shields",
 "Kitty Runolfsson"}>
[5] pry(main)> author_names.count
=> 18
[6] pry(main)> articles.count
=> 50
[7] pry(main)> mapping = {};
[8] pry(main)> author_names.map { |name| mapping[name.to_sym] = 0 };
[9] pry(main)> articles.each { |article| mapping[article.author_name.to_sym] += 1 };
[10] pry(main)> mapping
=> {:"Rozella Oberbrunner"=>7,
 :"Katherine Heathcote"=>5,
 :"Juliana Ledner"=>2,
 :"Alexandro Rowe"=>2,
 :"Cathy Torphy"=>3,
 :"Nichole Quitzon"=>4,
 :"Tanya Ryan"=>2,
 :"Abigail Emmerich"=>2,
 :"Bobby Hackett"=>5,
 :"Lisette Hirthe"=>5,
 :"Favian Nader"=>1,
 :"Lesley Thiel"=>4,
 :"Dulce Mitchell"=>2,
 :"Alexandrea Nitzsche"=>2,
 :"Jasmin Parisian"=>1,
 :"Mina Pouros"=>1,
 :"Jamaal Shields"=>1,
 :"Kitty Runolfsson"=>1}
[11] pry(main)> mapping.sort { |a1, a2| a1.last <=> a2.last }.reverse
=> [[:"Rozella Oberbrunner", 7],
 [:"Bobby Hackett", 5],
 [:"Katherine Heathcote", 5],
 [:"Lisette Hirthe", 5],
 [:"Nichole Quitzon", 4],
 [:"Lesley Thiel", 4],
 [:"Cathy Torphy", 3],
 [:"Dulce Mitchell", 2],
 [:"Juliana Ledner", 2],
 [:"Abigail Emmerich", 2],
 [:"Tanya Ryan", 2],
 [:"Alexandro Rowe", 2],
 [:"Alexandrea Nitzsche", 2],
 [:"Kitty Runolfsson", 1],
 [:"Jasmin Parisian", 1],
 [:"Mina Pouros", 1],
 [:"Jamaal Shields", 1],
 [:"Favian Nader", 1]]
[12] pry(main)>

We can then do things like

[12] pry(main)> articles.select { |article| article.author_name == 'Rozella Oberbrunner' }.
[12] pry(main)*     sort { |a1, a2| a1.updated_at <=> a2.updated_at }.    
[12] pry(main)*     reverse.each { |article| puts "#{article.title} -- Last updated at #{article.updated_at}" };
American Diaries 2: Son of American Diaries -- Last updated at 2016-02-04T13:55:51+08:00
Red Monster -- Last updated at 2016-01-24T16:49:53+08:00
The Red Rose of Northern Ireland -- Last updated at 2016-01-15T13:42:23+08:00
The Ultra Witch Who Fell to Earth -- Last updated at 2016-01-15T12:09:58+08:00
Season of the Dreams -- Last updated at 2016-01-09T17:29:29+08:00
Christmas on Yazmin Mount -- Last updated at 2016-01-09T13:59:26+08:00
Journey of the Action Friday -- Last updated at 2015-12-20T11:41:04+08:00
[13] pry(main)>

And since this is fixture data, the results should work exactly the same in your pry session as they do in the one transcripted above. Completing the SummariseContent class from here should be quite straightforward.

jdickey commented 8 years ago

Right. What's the difference between "most recent articles" and "most recently-updated articles"? We'd argue that the difference is that the latter is guaranteed not to include articles that have not been updated since originally published. That might not be immediately obvious to someone taking a quick read through this comment from 4 February which attempts to lay everything out.

We just felt the need to restate that for the record, as we'd just done a quick implementation of a "most recent articles" list that was really a "most *recently-updated articles" list. Almost identical logic, so it's not like we throw anything away… :grinning:

jdickey commented 8 years ago

Considering the list from the comment linked immediately above,

[x] Most recently-updated articles Second in the original list; first to have an initial implementation. We stress the phrase initial implementation since the code that exists as of Commit ee89cba (see here and here) simply sorts all Articles by the updated_at timestamp without filtering out those Articles without updates (and whose created_at and updated_at timestamps are therefore identical)
[x] Most recent articles This simply sorts all Articles by created_at timestamp, in reverse order (most recent first). Trivial; up next as we write this;
[x] "Hot topics" tag cloud A "tag cloud" is a presentation of tags or other textual short data items with size corresponding (usually) to occurrence in the source corpus. The upcoming SummariseContent#keywords_by_frequency method will read the keywords assigned to each article, maintaining a list of keywords with counts (independent of what Articles those keywords are associated with), which will then be sorted and returned.

These tasks need data beyond what is provided by the prolog_core Gem as of release 0.4. We don't think that scoring/ranking should be a part of prolog_core, but a separate component/Gem that depends on prolog_core. We're generally willing to be persuaded otherwise, however.

Top Articles orders published Articles by their CoRank or equivalent. Further definition of this is required, and may necessitate having this data item deferred until after Meldd 0.5 launches.
Top Authors similarly orders a list of authors of published Articles by decreasing CoRank score. While this is clear in concept, CoRank itself may not be fully designed and implemented in time for Meldd 0.5 and thus is also a candidate for deferral.
Rising Stars presents a ranking of authors of published Articles in descending order of increase in their CoRank over a fixed period of time. As with the two earlier features, this is unlikely to make it into Meldd 0.5.

Questions? Objections? Discussion and PRs welcome. :grinning:

mitpaladin commented 8 years ago

:P I would argue we don't need tag cloud sorting for 0.5 either, just provide "Most recently-updated articles" as default and "Most recent articles" to show how different sorting could be done

jdickey commented 8 years ago

As of Commit 81c5baa, we now have all the pieces we're going to have in the current whack at SummariseContent, but there's an inconsistency: we return a list of articles from #call, and there are separate methods for keywords by frequency, most recent articles, and most recently-updated articles. Either we should rename #call to #list_articles, or rework #call to return a Hash of all four lists. We'd argue for the latter, especially considering that what's currently done in #call is a prerequisite for any of the other lists.

Coming up shortly; the big time sink today was automating the rebuild-article-fixture-data rebuild process.