Closed keithamoss closed 8 years ago
Let's scrape the digital objects collection from SRO.
Rinse and repeat until we've got all 6,600 objects.
Dublin Core XML https://archive.sro.wa.gov.au/index.php/a-c-gregory-bejoording-to-sturt-river-through-wongan-hills-021;dc?sf_format=xml
Dublin Core Metadata Schema http://www.openarchives.org/OAI/2.0/oai_dc.xsd
I guess it's unlikely that they'll upgrade to the latest AtoM, that has an API?
In the short term, certainly unlikely, @samwilson.
They've suggested they could send us a database dump in the next couple of weeks though - so we may not have to scrape at all!
Oh that's cool! :)
Checkout https://github.com/geogeeks-au/maps-for-lost-towns/blob/master/scrapers/SRO%20Digital%20Objects%20Scraper.ipynb for a partially completed scraper.
The scraper has finished running against SRO! We've now got all 6,745 maps in the database in the new sro_digital_objects_collection
table.
Code is here: https://github.com/geogeeks-au/maps-for-lost-towns/blob/master/scrapers/SRO%20Digital%20Objects%20Scraper.ipynb CSV dump is here: https://github.com/geogeeks-au/maps-for-lost-towns/blob/master/scrapers/sro_digital_objects_collection.csv
Is there an API? If not, can we get a data dump? If not, we'll scrape.
Will ask SRO.
Next: #11