edgi-govdata-archiving / eot-nomination-tool

📚 Chrome extension to nominate government data that needs to be preserved
https://chrome.google.com/webstore/detail/nominationtool/abjpihafglmijnkkoppbookfkkanklok
GNU General Public License v3.0
20 stars 10 forks source link

What's next for the Chrome extension? #72

Closed titaniumbones closed 5 years ago

titaniumbones commented 7 years ago

With the End of Term project finishing up, we need to rethink the Chrome extension.

Am hoping you all can start the ball rolling on this!

I'm hoping @dallan @

sonalranjit commented 7 years ago

I am thinking of setting up a RESTful API with Flask and a Postgres database to submit the URLs from the Chrome extension, as we have painfully learned the spreadsheet is a hassle. The Chrome extension can post to the db and in turn the Archivers app can also call to the db. Let me know how this sounds, i can start scoping out the plan that can accomadate everyone.

danielballan commented 7 years ago

We have also discussed sending URLs directly from the extension to the app. On Tue, Feb 28, 2017 at 10:25 PM Sonal Ranjit notifications@github.com wrote:

I am thinking of setting up a REST API with Flask and a Postgres database to submit the URLs from the Chrome extension, as we have painfully learned the spreadsheet is a hassle. The Chrome extension can post to the db and in turn the Archivers app can also call to the db. Let me know how this sounds, i can start scoping out the plan that can accomadate everyone.

— You are receiving this because you were assigned.

Reply to this email directly, view it on GitHub https://github.com/edgi-govdata-archiving/eot-nomination-tool/issues/72#issuecomment-283234257, or mute the thread https://github.com/notifications/unsubscribe-auth/ACLIrlq5_cPQ_Hy-es5ZW5cey-CmfRXHks5rhOUrgaJpZM4MPH3U .

dcwalk commented 7 years ago

@danielballan what were/are the concerns around a direct integration? Are there Archivers issues we should pay attention do?

I recall this PR RE: integration: https://github.com/b5/pipeline/pull/74

danielballan commented 7 years ago

The way it works now is, an admin clicks a link in archivers.space app, the app queries our Google Spreadsheet for new rows, and it ingests them into our URL collection. Ideally the admin would check the spreadsheet before clicking the link to be sure we haven't been bombarded with 100000 junk submissions. Either way, we track which URLs come from which ingestion, so we can always roll back a particular ingestion that contained junk.

If we want to get away from Google Spreadsheets, I think it's worth considering removing the middleman altogether. Instead of building a separate Flask app to cache submissions, add a POST request target to the archivers.space app that accepts URLs in a kind of staging area. The ingestion process can remain the same, from the admin user point of view, but everything will happen inside the app: moving from a sort-of "quarantined" staging area into the main database.

What do you think, @sonalranjit?

sonalranjit commented 7 years ago

@danielballan it makes sense to add a POST request to the archivers app. Let me know how I can help with that, as far as I know the app is built off meteor.

danielballan commented 7 years ago

Great. Please do! Yes, it's built using Meteor and it's in a repo that we are keeping private for now. I'll get you access....

dcwalk commented 7 years ago

@titaniumbones -- could you update us on any conversation with Internet Archive about how to proceed?

Comment from DataRefuge slack today:

A new nomination tool will be set up to which folks can keep contributing seeds and these will continue to come to us (Internet Archive). The gist is that some institutions will ramp down crawling so we have a "bookended" EOT collection. We will keep taking nominations and crawling, though we may direct them into a broader .gov/.mil effort.

titaniumbones commented 7 years ago

Sorry for the delay. Um. I had understood things slightly differently... I'll ask Jefferson again. my thoughts as of today (which is a Thursday! and therefore nearly an event day!):

titaniumbones commented 7 years ago

ah, I see that comment was from Jefferson. So... this suggests we can continue to provide seeds. In which case, we should likely figure out a better way of doing so than sending-spreadsheets-to-very-busy-people-every-week.

ambergman commented 7 years ago

Sorry I'm so slow to see this, @titaniumbones, @danielballan, and @sonalranjit. I don't think there's a particular need to keep a spreadsheet of seed URLs we send to IA, especially if it's a hassle. It sounds like, whether we use the @sonalranjit's short-term implementation, or in the in-app staging area implementation suggested by @danielballan, that we will still have a DB where all of the seed URLs we've collected live permanently (and I think that makes sense, as it allows us to deduplicate at any point). Sounds like we could just have that place remain as our permanent record of what we've collected - and then find a convenient way to send those URLs on to IA.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in seven days if no further activity occurs. If it should not be closed, please comment! Thank you for your contributions.