associatedpress / geomancer

Open source tool to help journalists easily mash up data based on shared geography.
MIT License
59 stars 12 forks source link

Delayed file processing #5

Closed evz closed 9 years ago

evz commented 10 years ago

The process of geomancing is probably going to take longer than one could reasonably expect an HTTP Response to come back from a server. This means that we're going to have to build this in a way that allows for the actual data joining to take place in a separate process. That's relatively straightforward and I've already got some code stubbed out over in the datamade fork that should accomplish that pretty handily.

The question becomes how should we get the spreadsheet with the new attributes joined to it back to the user? We can create a page that just polls an endpoint that will respond with a download URL once the processing is done. Another approach would be to just collect an email address at some point during the workflow and, when the processing is done, email a download link to that address.

Does either of these options sound better than the other? Are there other options what might work in this here that I am are missing?

fgregg commented 10 years ago

How much time are we talking about? Seconds? Minutes? Fifteen minutes?

On Wed, Aug 13, 2014 at 3:29 PM, Eric van Zanten notifications@github.com wrote:

The process of geomancing is probably going to take longer than one could reasonably expect an HTTP Response to come back from a server. This means that we're going to have to build this in a way that allows for the actual data joining to take place in a separate process. That's relatively straightforward and I've already got some code stubbed out over in the datamade fork that should accomplish that pretty handily.

The question becomes how should we get the spreadsheet with the new attributes joined to it back to the user? We can create a page that just polls an endpoint that will respond with a download URL once the processing is done. Another approach would be to just collect an email address at some point during the workflow and, when the processing is done, email a download link to that address.

Does either of these options sound better than the other? Are there other options what might work in this here that I am are missing?

— Reply to this email directly or view it on GitHub https://github.com/associatedpress/geomancer/issues/5.

773.888.2718 2231 N. Monticello Ave Chicago, IL 60647

evz commented 10 years ago

Probably depends on how much data we're joining but I'd say minutes.

zstumgoren commented 10 years ago

@evz Async joining makes sense for lengthy processing, but implies dedicated servers and file hosting (e.g. AWS). @tthibo Is AP planning to provide Geomancer as a long-term hosted solution (a la Census Reporter), or is it intended to be a turn-key solution that end users set up on their own hardware/cloud (a la Panda)? Note, this might be worth discussing in a separate ticket.

Whether it's self-hosted or hosted by AP, I'd follow the example of PANDA and provide async uploads with option of email alert with download link.

tthibo commented 10 years ago

We're following the PANDA model, not assuming distributed processing. But even with distributed processing, async uploads with email alerts seem the way to go. DocumentCloud displays a "loading" image for the uploading document during processing and also offers the option of an email alert. I like that combination because if the process goes quickly, you see immediately that the document is available. If it takes longer, you're free to go your way and wait for email.

zstumgoren commented 10 years ago

+1 sounds good.

derekeder commented 9 years ago

This ended up not being much of an issue - geomancer returns on the spreadsheets we've tested with in less than a minute every time. Also with the caching of geolookups we're doing, its often times even faster than that.

Closing for now until/if this comes up again.