Open kmcculloch opened 7 years ago
From @b5 on February 15, 2017 18:17
From @dcwalk on February 15, 2017 19:9
@b5, to commandeer this issue-- does it make sense to track event application use here as well? I know we've had some questions come up about the status of where data is (e.g. after an event), but I don't know if we are doing this anywhere? Would doing this here be appropriate?
From @librlaurie on February 15, 2017 21:16
Hey @dcwalk - not sure what that means, but the data will start being visible in ckan in the next few days. (big progress finally happening https://www.datarefuge.org/organization). So events will be able to actually push their data all the way through to ckan as intended.
From @dcwalk on February 16, 2017 4:58
That is awesome @librlaurie. I meant more project tracking internally so we know when stuff from the seeded spreadsheet gets pushed to the app!
From @khdelphine on February 16, 2017 12:20
@dcwalk (and @b5, @librlaurie), yes, I think we totally need to keep track of content issues. Over the last couple of days we have been reporting them in the data-refuge workflow process or as issues like this one, but I agree that's not necessary the best way to do it.
What do you think would work best?
As an aside, perhaps a useful enhancement would be to have a way to flag "problem URLs", so that they would be quarantined into a separate list and could be easily reviewed by an expert/admin. I will add it as a separate issue.
This is related to https://github.com/edgi-govdata-archiving/archivers.space/issues/42
This gets at the larger issue of what the pipeline app is for. If it's just an expedient way of moving stuff along, this may not be such a priority. But if it's meant to serve as an ongoing reference of what has been archived and what hasn't, then we need to "close the loop" by reporting back on the data's final home once it has been ingested by ckan.
@kmcculloch: yes, we now have a field in the Describer section to put in the CKAN location once it exists.
However, I want to point out that the original issue reported was different. It was caused by URLs that had been harvested before the app existed, and so when they were ingested into the app they did not show up with their harvest zip file location.
From @khdelphine on February 14, 2017 14:2
A number of URLs that appear ready to be bagged do not have the Zip URL under “Harvest URL/Location”. For instance: http://www.archivers.space/urls/0C87975E-C222-4BD2-8516-B4E623EB67CB
Copied from original issue: b5/pipeline#67