codeforsanjose / city-agenda-scraper

9 stars 16 forks source link

Upload reports/agendas to DocumentCloud and figure out tagging #23

Open xconnieex opened 3 years ago

xconnieex commented 3 years ago

The Stanford team has a DC account and has uploaded a ton of documents already: https://www.documentcloud.org/app?q=project%3Aagenda-watch-bay-area-le-202027. Unfortunately we don't really have write/upload access so we can't put in the Legistar documents, which the Stanford team doesn't cover.

While we get ready with our Legistar documents, it would be great to look into how to use the API to upload and tag documents by committee automatically.

pmkuny commented 3 years ago

Took a look at this - hope I'm not adding confusion into the mix. API Documentation: https://www.documentcloud.org/help/api

There is the ability to do single file upload or to have the DC servers fetch publicly available documents. So as long as we have a link to a publicly available agenda, it seems that once auth'd to DC, we could make an API call that told DC to go fetch that document. Caveat to this is that it would need to be in a correct format to begin with (guessing PDF here looking at the report samples in the root)

Tagging

I see ways in the documentation to set and/or overwrite the value of a key;

PUT /api/documents/<document_id>/data/<key>/ - Set values for the given key This will override all values currently under key PATCH /api/documents/<document_id>/data/<key>/ - Add and/or remove values for the given key

but not a way to add new keys. Might bear more investigation. Reference

xconnieex commented 3 years ago

Wow I was not aware it could fetch publicly available documents... that's very interesting. @krammy19 what do you think about this? While I don't think it can take the place of the scraper, maybe we can use this to directly upload to DC in the future.

I'm also thinking there must be a way to create new keys since the ones they have are specific to the documents. Perhaps it was done manually on the front end but I really hope not. Thanks for your help!

pmkuny commented 3 years ago

I'm also thinking there must be a way to create new keys since the ones they have are specific to the documents. Perhaps it was done manually on the front end but I really hope not. Thanks for your help!

Happy to help! I'm guessing from the documentation there is a way to create those keys, as there's a specific _tag field they mention in the documentation around creating keys on the frontend. Unfortunately, I still don't see a way to do it via the API.