edgi-govdata-archiving / archivers.space

🗄 Event data management app used at DataRescues
https://www.archivers.space/
GNU Affero General Public License v3.0
6 stars 3 forks source link

Missing Harvest URL/Location in Bag section #31

Open kmcculloch opened 7 years ago

kmcculloch commented 7 years ago

From @khdelphine on February 14, 2017 14:2

A number of URLs that appear ready to be bagged do not have the Zip URL under “Harvest URL/Location”. For instance: http://www.archivers.space/urls/0C87975E-C222-4BD2-8516-B4E623EB67CB

Copied from original issue: b5/pipeline#67

kmcculloch commented 7 years ago

From @b5 on February 15, 2017 18:17

kmcculloch commented 7 years ago

From @dcwalk on February 15, 2017 19:9

@b5, to commandeer this issue-- does it make sense to track event application use here as well? I know we've had some questions come up about the status of where data is (e.g. after an event), but I don't know if we are doing this anywhere? Would doing this here be appropriate?

kmcculloch commented 7 years ago

From @librlaurie on February 15, 2017 21:16

Hey @dcwalk - not sure what that means, but the data will start being visible in ckan in the next few days. (big progress finally happening https://www.datarefuge.org/organization). So events will be able to actually push their data all the way through to ckan as intended.

kmcculloch commented 7 years ago

From @dcwalk on February 16, 2017 4:58

That is awesome @librlaurie. I meant more project tracking internally so we know when stuff from the seeded spreadsheet gets pushed to the app!

kmcculloch commented 7 years ago

From @khdelphine on February 16, 2017 12:20

@dcwalk (and @b5, @librlaurie), yes, I think we totally need to keep track of content issues. Over the last couple of days we have been reporting them in the data-refuge workflow process or as issues like this one, but I agree that's not necessary the best way to do it.

What do you think would work best?

As an aside, perhaps a useful enhancement would be to have a way to flag "problem URLs", so that they would be quarantined into a separate list and could be easily reviewed by an expert/admin. I will add it as a separate issue.

kmcculloch commented 7 years ago

This is related to https://github.com/edgi-govdata-archiving/archivers.space/issues/42

This gets at the larger issue of what the pipeline app is for. If it's just an expedient way of moving stuff along, this may not be such a priority. But if it's meant to serve as an ongoing reference of what has been archived and what hasn't, then we need to "close the loop" by reporting back on the data's final home once it has been ingested by ckan.

khdelphine commented 7 years ago

@kmcculloch: yes, we now have a field in the Describer section to put in the CKAN location once it exists.

However, I want to point out that the original issue reported was different. It was caused by URLs that had been harvested before the app existed, and so when they were ingested into the app they did not show up with their harvest zip file location.