We need the ORDER BY to ensure that the LIMIT/OFFSET gets contiguous pages of URIs (#69).
Ordering the observation-select.sparql causes the query to time out (on 28m observations on idp beta atm).
We could spill all the selected URIs to disk (in a single query with no limit/offset/order by) and page through that instead.
Downloading all observation-uris takes a minute and is about 600M on disk.
curl 'https://beta.gss-data.org.uk/sparql' -d 'query=PREFIX%20qb%3A%20%3Chttp%3A%2F%2Fpurl.org%2Flinked-data%2Fcube%23%3E%0A%0ASELECT%20%3Fobservation%0AWHERE%20%7B%20%0A%20%20%3Fobservation%20qb%3AdataSet%20%3Fcube%20.%0A%7D' > observations.txt
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 568M 0 568M 100 178 9064k 2 0:01:29 0:01:04 0:00:25 9.7M
We need the
ORDER BY
to ensure that theLIMIT
/OFFSET
gets contiguous pages of URIs (#69).Ordering the
observation-select.sparql
causes the query to time out (on 28m observations on idp beta atm).We could spill all the selected URIs to disk (in a single query with no limit/offset/order by) and page through that instead.
Downloading all observation-uris takes a minute and is about 600M on disk.