Closed Robinlovelace closed 4 years ago
And Joey to provide feedback on resulting data.
Bit of pseudo code below to explain how app_size is set at the moment - the appl_sub_type is derived from pattern matching in the application_type field
if appl_sub_type in [ 'Advertising', 'Signs', 'Telecoms', 'Hedges', 'Trees' ]: return 'Small' if appl_sub_type in [ 'Impact', 'Major' ]: return 'Large' // EIA, scoping opinion, major, large if n_statutory_days { if n_statutory_days <= 60: return 'Small' if not n_documents: return 'Medium' // default if there is no info from n_documents } if n_documents { if n_documents >= 100: return 'Large' elif n_documents >= 10: return 'Medium' else: return 'Small' }
Thanks @aspeakman that's useful and good to know, n_documents will tend to be lower in the past according to @joeytalbot who may have more comments on this. I like the coercion of 'hedges' to small, clearly scope to refine this, but a good starter for 10 (or should I say 3 ; ).
Also @aspeakman could you put a sentence or 2 based on that in the report here?: https://github.com/cyipt/acton/blob/master/vignettes/acton.Rmd#L60
Another question, any plans for a free text / regex search term argument to match with the description
field?
@aspeakman to provide a concrete example, would there be an endpoint associated with this query (also in pseudocode):
d = get_planit_data(query_value = "Allerton Byewater") # cannot search in description field
It is on my "to investigate" list, bit not currently possible
Thanks for quickfire response.
Heads-up @aspeakman I think there are still some issues around the 'large' classification. We are trying to make a query that gives us the case study developments including Allerton Bywater and the Climate Innovation District. Unfortunately, this endpoint, which I would expect to get all of them, fails:
Any ideas what we're doing wrong?
Also see feature 22 in the response: it definitely is not large:
The key words are case sensitiveand ignored if unmatched (to be fixed!). To get information on the success of the query it is best to start with a JSON query as follows (and in the near future I will include details on whether it has matched the keywords)
Try this (then convert to geojson if that is what you want)
Aha OK, it needs to be Large
not large
?
Makes sense, although somewhat unconventional to have capitalisation in api queries IMO (will defer to @mvl22 on this).
Update @aspeakman the two queries seem to return the same result. Reproducible example from the R console:
res1 = sf::read_sf("https://dev.planit.org.uk/api/applics/geojson?limit=100&bbox=&end_date=2020-03-05&start_date=2000-02-01&pg_sz=100&auth=Leeds&app_size=Large&app_state=Permitted")
res2 = sf::read_sf("https://dev.planit.org.uk/api/applics/geojson?limit=100&bbox=&end_date=2020-03-05&start_date=2000-02-01&pg_sz=100&auth=Leeds&app_size=large&app_state=permitted")
identical(res1, res2)
#> [1] TRUE
Created on 2020-03-05 by the reprex package (v0.3.0)
Item 22 in your updated endpoint still is not a large application, it's for a single house.
@aspeakman a query such as the one below will get us a list of Leeds planning applications, but we need ways of really slimming this down to be able to pick out the most significant ones (such as our case study sites).
How can we do this? One thing which I think would help in some (although not all) circumstances in searching for large numbers (i.e. >100) in the description field, that would hopefully represent large numbers of proposed homes. What else do you think will help with this?
get_planit_data(query_type = "applics", limit = 100, app_size = "large", app_state = "permitted",auth = "Leeds")
Yes I replied too soon. The API keywords are case insensitive (as you would expect) and it throws a 404 when they are not acceptable - but I will include some feedback in the 'detail' field to confirm what is being selected in the query
Yes I have thought about doing key word matching in the description field but my impression is the terms used are not really consistent enough. Numbers often appear in the description but can include identifiers, measurements and dates. Also the term used to describe dwellings/units can vary a lot. Some large applications just list the number of storeys or extent of land etc
You are right about uid Leeds/18/03181/RM - this is included because the key word "reserved matters" is on my list to be classified as sub_type 'Impact' (automatically included as Large). I think this is a mistake and can correct it, but I am not certain of the knock on effects - I think we will always be balancing false positives vs false negatives
Thanks @aspeakman but do you know a query that will get all of the applications of interest, namely:
"13/05235/FU@Leeds"
"15/04151/FU@Leeds"
"15/00415/FU@Leeds"
"15/01973/FU@Leeds"
These are all large applications but not all of them seem to appear in this endpoint:
(we will double check)
Good news @aspeakman we got the result 🎉
With this query in R:
library(acton)
# d = get_planit_data(query_value = "Allerton Byewater") # cannot search in description file
applications_leeds = get_planit_data(
limit = 500,
auth = "Leeds",
app_size = "large",
app_state = "permitted"
)
Generating the output shown below and this endpoint https://dev.planit.org.uk/api/applics/geojson?limit=500&bbox=&end_date=2020-03-05&start_date=2000-02-01&pg_sz=500&auth=Leeds&app_size=large&app_state=permitted
All 4 examples are Large and Permitted so they should be there. But your limit value needs adjusting - 'limit' is the max returned in the underlying query and there are 835 potential results (see the 'total' result field). So set the limit to 1000 and then page through to get the successive results.
library(acton)
# d = get_planit_data(query_value = "Allerton Byewater") # cannot search in description file
applications_leeds = get_planit_data(
limit = 500,
auth = "Leeds",
app_size = "large",
app_state = "permitted"
)
#> Getting data from https://dev.planit.org.uk/api/applics/geojson?limit=500&bbox=&end_date=2020-03-05&start_date=2000-02-01&pg_sz=500&auth=Leeds&app_size=large&app_state=permitted
nrow(applications_leeds)
#> [1] 494
Created on 2020-03-05 by the reprex package (v0.3.0)
Thanks @aspeakman but when the limit is 1000 and the page size is 1000 it breaks. Not easy to pull down multiple pages from R.
We could solve the problem R side but this suggests the resulting code will not be particularly pretty: https://httr.r-lib.org/articles/api-packages.html#pagination-handling-multi-page-responses
If you could bump up the maximum page size, e.g. to 1000 or 10,000 that could help our work.
There is an absolute maximum page size of 500. (all defined in the API documents - see https://dev.planit.org.uk/api/). This and the 'limit' are designed to protect against memory demands when requests are too big - as I have had some pretty crazy external request for large batches - and I think 10000 in one request would fall into that category
If you cant do paging in R then you would need to split up your date range - perhaps into 5 year chunks
Good response, makes sense, will see which solution works best.
If you could bump up the maximum page size, e.g. to 1000 or 10,000 that could help our work.
I would agree with Andrew that pagination is a sensible and reasonable requirement for the R interface to implement. Ultra-large responses in APIs of any kind like 10,000 results are inherently more likely to fail (busy servers, memory overload, HTTP transfer failures, etc.).
Thanks for the input, will see which solution works best.
do you know a query that will get all of the applications of interest, namely: "13/05235/FU@Leeds"
If you know an ID, you can use this API call format:
https://dev.planit.org.uk/planapplic/Leeds/13/05235/FU/geojson
Note that the area ("Leeds") and the ID ("13/05235/FU") in that URL format.
Already implemented. Give this reproducible example a try if you have R installed:
remotes::install_github("cyipt/acton")
#> Skipping install of 'acton' from a github remote, the SHA1 (732457a5) has not changed since last install.
#> Use `force = TRUE` to force installation
library(acton)
res = acton::get_planit_data(auth = "leeds")
#> Getting data from https://dev.planit.org.uk/api/applics/geojson?limit=6&bbox=&end_date=2020-03-08&start_date=2000-02-01&pg_sz=6&auth=leeds
mapview::mapview(res)
Created on 2020-03-08 by the reprex package (v0.3.0)
I've opened a new issue because I realise this is not yet in the docs: https://github.com/cyipt/acton/issues/53
Are there any other parts of the API that are not well documented here @mvl22 and @aspeakman ?: https://cyipt.github.io/acton/reference/get_planit_data.html
Here's a good place to say. Suggest we close this issue after the API arguments are well described in the docs and in the case study vignette: https://cyipt.github.io/acton/articles/case-studies.html
It's worth being aware that there are indeed two API calls, and that you are only using the area-based one, which essentially is an index of applications in an area, with the main details for each. The documentation for that looks pretty good to me.
However, the data returned will not give you the complete details, e.g. list of documents, for each such application, as that is more extensive. So the API call with the area and ID in the URL is essentially that secondary call. Doesn't look like that is needed for the ACTON project, but it's worth being aware that such an API call is available.
Yes just to expand on this point, the '/api/applics' endpoint will get you lists of applications that match different filtering critieria. However, by design, the lists only return the key summary information for each application. To get the full information you have to follow the 'link' field which leads you to the '/applic/' endpoint for that particular application.
Including size of application description.