Closed keithamoss closed 8 years ago
Letter drafted by H. Will send next week.
The Mali Index doesn't contain all of SRO's maps, only certain map series.
e.g. Cancelled Public Plans are, but District Plans and Group Settlement Plans aren't.
Ergo, the Landgate Map Historical Index only has a subset too. There's roughly 1,000 in the index.
Let's bin by series name to get an idea of what we're facing...
SELECT series, COUNT(*) as series_total FROM sro_digital_objects_collection GROUP BY series ORDER BY series
Let's look at Townsite Plans.
SELECT fields->>'title', fields->>'filename', fields->>'date_s_' FROM sro_digital_objects_collection WHERE series = 'TOWNSITE PLANS' LIMIT 5;
Hi Future Keith,
So, this is how the Map Index seems to work.
Look at the filename -
e.g. "Cons_4567_Item_097-9.jpg"
Tangent Fuck. It looks like title
does not equal the main item tile.
e.g. https://archive.sro.wa.gov.au/index.php/300-chain-plan-tally-no-503156-097-9
Title: 300 Chain Plan [Tally No. 503156] Item Title: Item 097/9 - 300 Chain Plan [Tally No. 503156]
Did we capture the actual item title from the scraper and get it to the database?
Anyway, the important bit is the Item 097/9
bit, along with 300 Chain Plan
.
In the Map Index we have MapId of 97/300
and ID of 300
in the "Pastoral Chain 300" feature class.
OK, let's hackily fix that scraping fuck up by grabbing the item id
from the reference
field that we already scraped. The item title from SRO just seems to be item id
+ title
anyway.
For the purposes of matching against Landgate's map index we're assuming that items only have valid ids if they contain a / character.
CREATE OR REPLACE FUNCTION last_position(text, text) RETURNS integer LANGUAGE SQL AS $$ Select length($1) - position(reverse($2) in reverse($1)) - length($2); $$;
ALTER TABLE sro_digital_objects_collection ADD COLUMN item_id character varying;
UPDATE sro_digital_objects_collection SET item_id = trim(substring(fields->>'reference_code' from last_position(fields->>'reference_code', ' ') + 2));
UPDATE sro_digital_objects_collection SET item_id = NULL WHERE id NOT IN (SELECT id FROM sro_digital_objects_collection WHERE item_id LIKE '%/%');
SELECT * FROM sro_digital_objects_collection WHERE item_id IS NOT NULL;
-- 1,232 items with recognisable item ids.
Setting item ids for 80 Chain Plans.
SELECT fields->>'title', trim(left(fields->>'title', position(' ' in fields->>'title'))) FROM lost_towns.sro_digital_objects_collection WHERE item_id IS NULL AND fields->>'title' ILIKE '%80 Chain Plan%';
UPDATE lost_towns.sro_digital_objects_collection SET item_id = trim(left(fields->>'title', position(' ' in fields->>'title'))) WHERE item_id IS NULL AND fields->>'title' ILIKE '%80 Chain Plan%';
UPDATE lost_towns.sro_digital_objects_collection SET item_id = NULL WHERE id NOT IN (SELECT id FROM lost_towns.sro_digital_objects_collection WHERE item_id LIKE '%/%');
1,132 maps located from the Landgate map indices!
Make that 1,231.
UPDATE sro_digital_objects_collection SET geom_mapindices = (SELECT ST_GEOMFROMEWKT(t2.geom_mapindices) FROM sro_digital_objects_collection_for_reimport AS t2 WHERE t2.id = id LIMIT 1) WHERE item_id IS NOT NULL AND id IN (SELECT id::int FROM sro_digital_objects_collection_for_reimport WHERE geom_mapindices IS NOT NULL);
ALTER TABLE sro_digital_objects_collection ADD COLUMN geom_mapindices Geometry(MultiPolygon, 4283);
NTSS has a Historical Map Index of the Mali Catalogue plans that spatially associates each plan roughly to its actual location that they built for internal use.
BUT the only link they have to the plans is the MapId and their version of the filename (which is raw from the scanner) - so this is only useful if we have the same MapId.
MapIds are formatted like
100/10
or0/20
as far as I could see.RP at LG would be the person to talk to to formally request the index.
This is an alternative to what we proposed doing in #9.