CSIRO-enviro-informatics / asgs-dataset

GNU General Public License v3.0
0 stars 1 forks source link

Bug in remotenessarea URIs? #26

Closed jyucsiro closed 4 years ago

jyucsiro commented 4 years ago

Hitting into some confusing URIs. RA URI prefixes are used in SUAs

Running a query like this on the triple store cache:

PREFIX asgs: <http://linked.data.gov.au/def/asgs#>
select * where { 
    ?s a ?type .
    FILTER(strStarts(str(?s), "http://linked.data.gov.au/dataset/asgs2016/remotenessarea/"))
} limit 1000 

Excerpt of the result of the query:

item Feature URI Type
1 http://linked.data.gov.au/dataset/asgs2016/remotenessarea/10 geo:Feature
2 http://linked.data.gov.au/dataset/asgs2016/remotenessarea/10 asgs:RemotenessArea
3 http://linked.data.gov.au/dataset/asgs2016/remotenessarea/10 asgs:Feature
4 http://linked.data.gov.au/dataset/asgs2016/remotenessarea/1000 asgs:SignificantUrbanArea
5 http://linked.data.gov.au/dataset/asgs2016/remotenessarea/1001 asgs:SignificantUrbanArea

For some reason, SUAs have the same prefix as remoteness areas... is this a bug?

ashleysommer commented 4 years ago

Yep, looks like a bug! Good catch.

Edit: Or maybe not? This is the relevant section of code, it seems right: https://github.com/CSIRO-enviro-informatics/asgs-dataset/blob/61cd90d5487a80bc902d2cbbb38645093542b1da/asgs_dataset/model/asgs_feature.py#L1555

I've checked the ttl output for both SUA and RA on asgsld.net and both looks to be producing rdf types with correct prefix.

jyucsiro commented 4 years ago

it's strange though that it's appearing in the cache. @benjaminleighton any ideas?

jyucsiro commented 4 years ago

@ashleysommer what about

https://github.com/CSIRO-enviro-informatics/asgs-dataset/blob/61cd90d5487a80bc902d2cbbb38645093542b1da/asgs_dataset/model/asgs_feature.py#L1493

benjaminleighton commented 4 years ago

I'm not sure but a place to check would be in the dataset downloaded from s3 as part of the cache build, the exact reference will be part of download-data.sh on the cache machine itself but @ashleysommer might have a good idea which file to look in.

ashleysommer commented 4 years ago

@jyucsiro Yep, that looks like the bug that causes it. Its actually part of the SA2 feature RDF mapping. When a SA2 is part of a SUA, then it puts /remotenessarea/1000 a SUA and this.sa2 sfWithin /remotenessarea/1000 Instead should be /significanturbanarea/1000 a SUA and this.sa2 sfWithin /significanturbanarea/1000

So actually what needs to be reharvested here is the set of SA2s. Thats why I didn't find the problematic triples when looking in the SUA set and the RA set.

ashleysommer commented 4 years ago

I've fixed the bug and reharvesting set of SA2s now. To fix the current cache, we'll have to remove the any instances of triples which say /remotenessarea/x rdf:type asgs#SUA and any that say this.sa2 sfWithin /remotenessarea/x Then re-ingest the new set of SA2s with the correct sfWithin /significanturbanarea/x triples