Coordinate backend work

danielballan commented 7 years ago

To reiterate and elaborate on a plan I presented during last night's call:

To date, a lot of the developer effort on the version tracking project has landed in disparate repositories, building from scratch. I would like to pull together the work done so far into a minimally-functional web app in this repository. I intend to write only the minimum required new code to glue things together; this will be mainly a git-wrangling activity, merging disparate efforts.

My priority here is to enable volunteer contributors with a couple hours to spare to toss in meaningful contributors, as @b5 did for the archivers.space app. My current plan is to build around a minimal Tornado app to be used by the analysts, which can make asynchronous calls to other services. Python seems like a good language for the community, in terms of maximizing accessibility to potential contributors. And I think Tornado is an appropriately plain and "un-opinionated" framework for this phase of the work. But I don't feel strongly about these technical choices and, at this early stage, I'm going in with the assumption that all code I write may be deleted and replaced by something better in the medium-future.

The first goal of a unified app should be to achieve feature parity with the version-tracking team's CSV-based workflow that is currently producing useful results. I know a lot of UI work has already been done in this direction; I hope that can be folded in as well as soon as the backend is functional.

Meanwhile, @b5 has been thinking carefully about how different projects/services should interface with one another. This work should track that effort, both informing it and attempting to comply with it.

danielballan commented 7 years ago

Notes: One important thing to capture: @allanpichardo's work and request structure implemented here: https://github.com/lightandluck/pagefreezer-cli/blob/34c8f2825138c1f92a435d3b5d78b601d229cafa/interface/app.js

allanpichardo commented 7 years ago

@danielballan If you need any help, I'd be glad to. I just don't have enough time to coordinate it. However, if you can figure out an architecture that would work to put all these pieces together, let me know.

Are we going to go with pagefreezer the whole way? Or is there another module that works off of git?

danielballan commented 7 years ago

Great, thanks for chiming in. I'll keep posting my thought process on here, and I welcome any input you have time to give.

My sense from recent discussions is that PageFreezer is the place to start, but we should aim to keep the diff-computation aspect pluggable.

allanpichardo commented 7 years ago

@danielballan I agree. It would be great if someone could work on a module that computes diffs through git and returns it as both a JSON response in the same format as pagefreezer, and can save a text file to disk. This way, that module can be easily substituted in and it can also be run in a background process to monitor pages.

titaniumbones commented 7 years ago

@allanpichardo I htink the differ module (currently in its own repo) does just what you suggest: https://github.com/edgi-govdata-archiving/differ

I agree PageFreezer is the place to start. Right now PF's service doens't return multiple diff "streams" (full html pages + html chunks + text-only changes) which was identified as desirable in NYC. Analysts have also asked for multiple diff views, including a text-only view;not sure if we want to require the diff service to return that text-only information, or if we can do that in post-processing within the UI. The former seems preferable to me if it's possible.

danielballan commented 7 years ago

This issue has been superceded by ongoing conversation in #29. Closing.

edgi-govdata-archiving / web-monitoring-ui

Coordinate backend work #28