edgi-govdata-archiving / archivers.space

🗄 Event data management app used at DataRescues
https://www.archivers.space/
GNU Affero General Public License v3.0
6 stars 3 forks source link

Update research to reflect changes to Nomination Chrome Extension #7

Open kmcculloch opened 7 years ago

kmcculloch commented 7 years ago

From @dcwalk on February 1, 2017 3:58

See edgi-govdata-archiving/presidential-harvest-nomination-tool#7.

screen shot 2017-01-31 at 10 51 29 pm

Copied from original issue: b5/pipeline#9

kmcculloch commented 7 years ago

From @danielballan on February 1, 2017 4:4

Thanks for the heads up. I was actually not aware of the work in https://github.com/edgi-govdata-archiving/presidential-harvest-nomination-tool/issues/38 when I opened https://github.com/b5/pipeline/pull/8. It looks like we are thinking along the same lines though. I'll go ahead and add FTP and Interactive Visualization as checkboxes -- any others?

kmcculloch commented 7 years ago

From @danielballan on February 1, 2017 4:4

(I don't think we can or should aim to make the options comprehensive, but listing known common cases feels like a good idea to me.)

kmcculloch commented 7 years ago

From @dcwalk on February 1, 2017 4:6

I don't think we need comprehensive, but our tools should be consistent in terms of how these issues are identified and grouped

kmcculloch commented 7 years ago

From @danielballan on February 1, 2017 4:8

Definitely agree. I'm happy to follow your lead here. :- )

kmcculloch commented 7 years ago

From @dcwalk on February 1, 2017 15:57

Hey, saw this was addressed in #8, just wanted to link to how it is in the nomination extension: https://github.com/edgi-govdata-archiving/presidential-harvest-nomination-tool/blob/9f5192b26f32f378773f12985e71b97f3040f320/popup.html#L606

screen shot 2017-02-01 at 10 51 54 am

I like the categories you've added, I'm wondering if we want to have the language in sync, @titaniumbones maybe we make changes to both? (sorry! So much tweaking, maybe not essential)

kmcculloch commented 7 years ago

From @danielballan on February 1, 2017 16:10

Thanks for the screenshot. I agree we should sync up the language. Has there been any discussion about when/if the Chrome extension will push directly to the app?

kmcculloch commented 7 years ago

From @dcwalk on February 1, 2017 16:26

Uh, that is live, installed via chrome store. However!!! It seems like there was a change that wasn't caught. It'll be live and canonical from 4:30pm on tonight

kmcculloch commented 7 years ago

From @titaniumbones on February 1, 2017 16:53

um, I don't think the extension pushes to the app directly? Is that what you meant @dcwalk? I would love to have the extension go straight to the app, BUT there are authorization issues w/ a publicly-available extension (yes?), and also we only want to push the so-called "uncrawlables," not the "normal" seeds -- but we no longer use that term, so ligic is fuzzier.

if we want to push to the tool, probably @danielballan or @b5 will have to do that part as I still don't understand how the app ingests seeds.

kmcculloch commented 7 years ago

From @dcwalk on February 1, 2017 17:1

Ach, no I'm being confusing @titaniumbones. I just meant it would be great to have the way we are categorizing "uncrawables" to be consistent across both apps. Not that they be linked

kmcculloch commented 7 years ago

From @titaniumbones on February 1, 2017 18:41

I have not heard any dicussion. Would love to make the change, but only for "uncrawlables, right? And we've now removed "uncrawlable" as a category so the logic is harder, anyway more confusing, to implement

On 02/01/2017 11:10 AM, Dan Allan wrote:

Thanks for the screenshot. I agree we should sync up the language. Has there been any discussion about when/if the Chrome extension will push directly to the app?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/b5/pipeline/issues/9#issuecomment-276699585, or mute the thread https://github.com/notifications/unsubscribe-auth/AAWPNG_bSlRsg6ePiAbSlYarSIQUkGoRks5rYK5qgaJpZM4Lzd50.

kmcculloch commented 7 years ago

From @danielballan on February 1, 2017 22:18

I guess we should try to adjust the extension and the app so that each field in the extension has a corresponding field in the app where its info can land.

As you say, we'd have to allow the extension to push data into the app unauthenticated. I don't see any problem with that. It seems about as reasonable as allowing it to push into a spreadsheet unauthenticated, as I assume it does currently. We can log the time (and maybe the IP?) so that, in the worst-case scenario, we have a hope of filtering out a flood of submissions sent in bad faith.

One question I have is, how much of this sort of detail should be submitted by the users of the Chrome extension, and how much should be added later upon review by "Researcher" in the pipeline? Maybe we should collect all of (or most of) the same info, but mark it as "unreviewed" until someone in the pipeline app clicks a button to say they verified it. Chrome extensions users can give us their best guess, as much as their scientific domain expertise or technical expertise allows, and it will get reviewed/seconded by another person.

kmcculloch commented 7 years ago

From @titaniumbones on February 1, 2017 22:29

@danielballan that sounds great. right now the extension just posts a jQuery $.get to a google form URL:

  const data = { ifq: '' };

  data[NAME_FIELD] = localStorage.name;
  data[EMAIL_FIELD] = localStorage.email;
  data[TITLE_FIELD] = title;
  data[EVENTNAME_FIELD] = localStorage.eventName;
  data[URL_FIELD] = currentURL;
  data[AGENCY_FIELD] = agency;
  data[AGENCY_ID] = agencyID;
  data[SUBAGENCY_ID] = subAgencyID;
  data[ORGANIZATION_ID] = organizationID;
  data[SUBORG_ID] = suborgID;
  data[SUBPRIMER_ID] = subprimerID;
  //data[CRAWLABLE_ID] = crawlableID;
  data[FTP_ID] = ftpID;
  data[VISUALIZATION_ID] = visualizationID;
  data[DIFFICULTY_ID] = difficultyID;
  data[DATABASE_ID] = difficultyID;
  data[COMMMENT_ID] = commentID;

  // Do GET call to post to Google Form and open new tab
  $.get( {
    url: GOOGLE_FORMS_URL,
    data,
    success: function( res ) {
      $( '#success' ).html( "Success!" );
      setTimeout( function() {
          window.location.reload();
        }, 1000 );
        // uncomment this line to also add the URL through the official notificaiton tool.
        // window.open(NOTIFICATION_TOOL_URL + currentURL);
    },
    error: function( err ) {
      $( '#error' ).html( err.statusText || "Error!" );
    }
  } );
}

We could easily add a second .get to duplicate the data, or use some case or if/else logic to redirect the request to the pipeline app.

We can also rename the fields whatever we want in the extension, obvs.

kmcculloch commented 7 years ago

From @titaniumbones on February 1, 2017 22:32

we'd need a way to throw "misfiled" URL's from one datastore to another. Not sure what the best practice would be. If eventually the app stores both, and permits queries from both EDGI & datarefuge, that might be ideal. THen we could generate reports to pass e.g. to Internet Archive.

kmcculloch commented 7 years ago

From @titaniumbones on February 1, 2017 22:35

OT: Looking ahead, when this is open sourced there might be non-environmental/climate types who want to do something similar; it would be interesting to be able to federate app instances. Also to keep track of where data lands, so that end users can establish the disposition of a given dataset via a single query.

kmcculloch commented 7 years ago

From @danielballan on February 2, 2017 19:53

We could easily add a second .get to duplicate the data, or use some case or if/else logic to redirect the request to the pipeline app.

I suggest a two-step process:

  1. Add a second .get, and include in the request a "version" tag describing the Chrome extension. Some users will have the old version of the Chrome extension for awhile, I assume, so we'll still need to migrate their submissions from the spreadsheet to the app. We'll be able to tell which rows to migrate by looking for the presence of the "version" column. (I guess we could also do this by searching for duplicates, but I'd rather know it explicitly.)
  2. If/when we are sure that the app is robust and that it's the way we want to go, remove .get to the spreadsheet.

we'd need a way to throw "misfiled" URL's from one datastore to another.

I'm not sure I know what you're getting at here. Misfiled as in will crawl vs will not crawl?

Also to keep track of where data lands, so that end users can establish the disposition of a given dataset via a single query.

Maybe the pipeline app response to the GET request with the uuid that it assigned?

I'm not familiar with the extension codebase, so please forgive me if any of the above is confused.

kmcculloch commented 7 years ago

From @titaniumbones on February 2, 2017 21:14

we'd need a way to throw "misfiled" URL's from one datastore to another.

I'm not sure I know what you're getting at here. Misfiled as in will crawl vs will not crawl?

Yes exactly. If "crawls" go to a spreadsheet (for transfer to IA) and "not crawls" go to the app (for bagging), we need to have a wayto move the URL's from one storage medium ot the other if the initial filer was in error about the approproate category.

Can you paste into this issue a $.get({}) command that will send a record to the app, and we can start testing the chrome extension?

Also to keep track of where data lands, so that end users can establish the disposition of a given dataset via a single query.

Maybe the pipeline app response to the GET request with the uuid that it assigned?

Seems like a good idea, but what I meant was that it would be nice some time in the future to be able to respond to a user who asks, "where on the web can I find a dataset that was harvested via this app? 'cause I want to use that data and the government has hidden it."

kmcculloch commented 7 years ago

From @danielballan on February 2, 2017 21:15

Yep. I'm going to prioritize completing #13, but I'll tackle this next.

kmcculloch commented 7 years ago

From @titaniumbones on February 3, 2017 16:32

@danielballan if you get done with your many other obligations... I teach at 3 and am pretty much out of ocmmission from then until tmrw morning, so... probalby soon or not-this-time for this. Or file new bug in https://github.com/edgi-govdata-archiving/eot-nomination-tool/issues w the code & some description, & tag maybe me, @atesgoral, and @sonalranjit

kmcculloch commented 7 years ago

From @danielballan on February 3, 2017 19:53

Just finished #13. I agree we shouldn't attempt to rush this out for NYC. Maybe we can hack on it together tomorrow if time allows, or more likely revisit it next week in time for future events.