geogeeks-au / maps-for-lost-towns

Georeferencing historic maps through crowdsourcing
4 stars 1 forks source link

Can we use Landgate's Historical Map Index to geolocate images close to their actual locations #17

Closed keithamoss closed 8 years ago

keithamoss commented 8 years ago

NTSS has a Historical Map Index of the Mali Catalogue plans that spatially associates each plan roughly to its actual location that they built for internal use.

BUT the only link they have to the plans is the MapId and their version of the filename (which is raw from the scanner) - so this is only useful if we have the same MapId.

MapIds are formatted like 100/10 or 0/20 as far as I could see.

RP at LG would be the person to talk to to formally request the index.

This is an alternative to what we proposed doing in #9.

keithamoss commented 8 years ago

Letter drafted by H. Will send next week.

keithamoss commented 8 years ago

The Mali Index doesn't contain all of SRO's maps, only certain map series.

e.g. Cancelled Public Plans are, but District Plans and Group Settlement Plans aren't.

Ergo, the Landgate Map Historical Index only has a subset too. There's roughly 1,000 in the index.

Let's bin by series name to get an idea of what we're facing...

SELECT series, COUNT(*) as series_total FROM sro_digital_objects_collection GROUP BY series ORDER BY series
mapsbyseries
keithamoss commented 8 years ago

Let's look at Townsite Plans.

SELECT fields->>'title', fields->>'filename', fields->>'date_s_' FROM sro_digital_objects_collection WHERE series = 'TOWNSITE PLANS' LIMIT 5;
keithamoss commented 8 years ago

Hi Future Keith,

So, this is how the Map Index seems to work.

Look at the filename -

e.g. "Cons_4567_Item_097-9.jpg"

Tangent Fuck. It looks like title does not equal the main item tile.

e.g. https://archive.sro.wa.gov.au/index.php/300-chain-plan-tally-no-503156-097-9

Title: 300 Chain Plan [Tally No. 503156] Item Title: Item 097/9 - 300 Chain Plan [Tally No. 503156]

Did we capture the actual item title from the scraper and get it to the database?

Anyway, the important bit is the Item 097/9 bit, along with 300 Chain Plan.

In the Map Index we have MapId of 97/300 and ID of 300 in the "Pastoral Chain 300" feature class.

keithamoss commented 8 years ago

OK, let's hackily fix that scraping fuck up by grabbing the item id from the reference field that we already scraped. The item title from SRO just seems to be item id + title anyway.

For the purposes of matching against Landgate's map index we're assuming that items only have valid ids if they contain a / character.

CREATE OR REPLACE FUNCTION last_position(text, text) RETURNS integer LANGUAGE SQL AS $$ Select length($1) - position(reverse($2) in reverse($1)) - length($2); $$;

ALTER TABLE sro_digital_objects_collection ADD COLUMN item_id character varying;

UPDATE sro_digital_objects_collection SET item_id = trim(substring(fields->>'reference_code' from last_position(fields->>'reference_code', ' ') + 2));

UPDATE sro_digital_objects_collection SET item_id = NULL WHERE id NOT IN (SELECT id FROM sro_digital_objects_collection WHERE item_id LIKE '%/%');

SELECT * FROM sro_digital_objects_collection WHERE item_id IS NOT NULL;
-- 1,232 items with recognisable item ids.
keithamoss commented 8 years ago

Setting item ids for 80 Chain Plans.

SELECT fields->>'title', trim(left(fields->>'title', position(' ' in fields->>'title'))) FROM lost_towns.sro_digital_objects_collection WHERE item_id IS NULL AND fields->>'title' ILIKE '%80 Chain Plan%';

UPDATE lost_towns.sro_digital_objects_collection SET item_id = trim(left(fields->>'title', position(' ' in fields->>'title'))) WHERE item_id IS NULL AND fields->>'title' ILIKE '%80 Chain Plan%';

UPDATE lost_towns.sro_digital_objects_collection SET item_id = NULL WHERE id NOT IN (SELECT id FROM lost_towns.sro_digital_objects_collection WHERE item_id LIKE '%/%');
keithamoss commented 8 years ago

1,132 maps located from the Landgate map indices!

keithamoss commented 8 years ago

Make that 1,231.

UPDATE sro_digital_objects_collection SET geom_mapindices = (SELECT ST_GEOMFROMEWKT(t2.geom_mapindices) FROM sro_digital_objects_collection_for_reimport AS t2 WHERE t2.id = id LIMIT 1) WHERE item_id IS NOT NULL AND id IN (SELECT id::int FROM sro_digital_objects_collection_for_reimport WHERE geom_mapindices IS NOT NULL);
ALTER TABLE sro_digital_objects_collection ADD COLUMN geom_mapindices Geometry(MultiPolygon, 4283);