cyipt / acton

Active Transport Options for New Developments
https://cyipt.github.io/acton/
GNU General Public License v3.0
3 stars 2 forks source link

More descriptions of data #48

Closed Robinlovelace closed 4 years ago

Robinlovelace commented 4 years ago

Including size of application description.

Robinlovelace commented 4 years ago

And Joey to provide feedback on resulting data.

aspeakman commented 4 years ago

Bit of pseudo code below to explain how app_size is set at the moment - the appl_sub_type is derived from pattern matching in the application_type field

if appl_sub_type in [ 'Advertising', 'Signs', 'Telecoms', 'Hedges', 'Trees' ]: return 'Small' if appl_sub_type in [ 'Impact', 'Major' ]: return 'Large' // EIA, scoping opinion, major, large if n_statutory_days { if n_statutory_days <= 60: return 'Small' if not n_documents: return 'Medium' // default if there is no info from n_documents } if n_documents { if n_documents >= 100: return 'Large' elif n_documents >= 10: return 'Medium' else: return 'Small' }

Robinlovelace commented 4 years ago

Thanks @aspeakman that's useful and good to know, n_documents will tend to be lower in the past according to @joeytalbot who may have more comments on this. I like the coercion of 'hedges' to small, clearly scope to refine this, but a good starter for 10 (or should I say 3 ; ).

Robinlovelace commented 4 years ago

Also @aspeakman could you put a sentence or 2 based on that in the report here?: https://github.com/cyipt/acton/blob/master/vignettes/acton.Rmd#L60

Robinlovelace commented 4 years ago

Another question, any plans for a free text / regex search term argument to match with the description field?

Robinlovelace commented 4 years ago

@aspeakman to provide a concrete example, would there be an endpoint associated with this query (also in pseudocode):

d = get_planit_data(query_value = "Allerton Byewater") # cannot search in description field
aspeakman commented 4 years ago

It is on my "to investigate" list, bit not currently possible

Robinlovelace commented 4 years ago

Thanks for quickfire response.

Robinlovelace commented 4 years ago

Heads-up @aspeakman I think there are still some issues around the 'large' classification. We are trying to make a query that gives us the case study developments including Allerton Bywater and the Climate Innovation District. Unfortunately, this endpoint, which I would expect to get all of them, fails:

https://dev.planit.org.uk/api/applics/geojson?limit=100&bbox=&end_date=2020-03-05&start_date=2000-02-01&pg_sz=100&auth=Leeds&app_size=large&app_state=permitted

Any ideas what we're doing wrong?

Also see feature 22 in the response: it definitely is not large: image

aspeakman commented 4 years ago

The key words are case sensitiveand ignored if unmatched (to be fixed!). To get information on the success of the query it is best to start with a JSON query as follows (and in the near future I will include details on whether it has matched the keywords)

Try this (then convert to geojson if that is what you want)

https://dev.planit.org.uk/api/applics/json?limit=100&bbox=&end_date=2020-03-05&start_date=2000-02-01&pg_sz=100&auth=Leeds&app_size=Large&app_state=Permitted

Robinlovelace commented 4 years ago

Aha OK, it needs to be Large not large?

Makes sense, although somewhat unconventional to have capitalisation in api queries IMO (will defer to @mvl22 on this).

Robinlovelace commented 4 years ago

Update @aspeakman the two queries seem to return the same result. Reproducible example from the R console:

Robinlovelace commented 4 years ago
res1 = sf::read_sf("https://dev.planit.org.uk/api/applics/geojson?limit=100&bbox=&end_date=2020-03-05&start_date=2000-02-01&pg_sz=100&auth=Leeds&app_size=Large&app_state=Permitted")
res2 = sf::read_sf("https://dev.planit.org.uk/api/applics/geojson?limit=100&bbox=&end_date=2020-03-05&start_date=2000-02-01&pg_sz=100&auth=Leeds&app_size=large&app_state=permitted")
identical(res1, res2)
#> [1] TRUE

Created on 2020-03-05 by the reprex package (v0.3.0)

Item 22 in your updated endpoint still is not a large application, it's for a single house.

joeytalbot commented 4 years ago

@aspeakman a query such as the one below will get us a list of Leeds planning applications, but we need ways of really slimming this down to be able to pick out the most significant ones (such as our case study sites).

How can we do this? One thing which I think would help in some (although not all) circumstances in searching for large numbers (i.e. >100) in the description field, that would hopefully represent large numbers of proposed homes. What else do you think will help with this?

get_planit_data(query_type = "applics", limit = 100, app_size = "large", app_state = "permitted",auth = "Leeds")
aspeakman commented 4 years ago

Yes I replied too soon. The API keywords are case insensitive (as you would expect) and it throws a 404 when they are not acceptable - but I will include some feedback in the 'detail' field to confirm what is being selected in the query

Yes I have thought about doing key word matching in the description field but my impression is the terms used are not really consistent enough. Numbers often appear in the description but can include identifiers, measurements and dates. Also the term used to describe dwellings/units can vary a lot. Some large applications just list the number of storeys or extent of land etc

You are right about uid Leeds/18/03181/RM - this is included because the key word "reserved matters" is on my list to be classified as sub_type 'Impact' (automatically included as Large). I think this is a mistake and can correct it, but I am not certain of the knock on effects - I think we will always be balancing false positives vs false negatives

Robinlovelace commented 4 years ago

Thanks @aspeakman but do you know a query that will get all of the applications of interest, namely:

"13/05235/FU@Leeds"

"15/04151/FU@Leeds"

"15/00415/FU@Leeds"

"15/01973/FU@Leeds"

These are all large applications but not all of them seem to appear in this endpoint:

https://dev.planit.org.uk/api/applics/json?limit=100&bbox=&end_date=2020-03-05&start_date=2000-02-01&pg_sz=100&auth=Leeds&app_size=Large&app_state=Permitted

(we will double check)

Robinlovelace commented 4 years ago

Good news @aspeakman we got the result 🎉

Robinlovelace commented 4 years ago

With this query in R:

library(acton)
# d = get_planit_data(query_value = "Allerton Byewater") # cannot search in description file
applications_leeds = get_planit_data(
  limit = 500,
  auth = "Leeds",
  app_size = "large",
  app_state = "permitted"
  )

Generating the output shown below and this endpoint https://dev.planit.org.uk/api/applics/geojson?limit=500&bbox=&end_date=2020-03-05&start_date=2000-02-01&pg_sz=500&auth=Leeds&app_size=large&app_state=permitted

aspeakman commented 4 years ago

All 4 examples are Large and Permitted so they should be there. But your limit value needs adjusting - 'limit' is the max returned in the underlying query and there are 835 potential results (see the 'total' result field). So set the limit to 1000 and then page through to get the successive results.

Robinlovelace commented 4 years ago
library(acton)
# d = get_planit_data(query_value = "Allerton Byewater") # cannot search in description file
applications_leeds = get_planit_data(
  limit = 500,
  auth = "Leeds",
  app_size = "large",
  app_state = "permitted"
  )
#> Getting data from https://dev.planit.org.uk/api/applics/geojson?limit=500&bbox=&end_date=2020-03-05&start_date=2000-02-01&pg_sz=500&auth=Leeds&app_size=large&app_state=permitted
nrow(applications_leeds)
#> [1] 494

Created on 2020-03-05 by the reprex package (v0.3.0)

Robinlovelace commented 4 years ago

Thanks @aspeakman but when the limit is 1000 and the page size is 1000 it breaks. Not easy to pull down multiple pages from R.

Robinlovelace commented 4 years ago

Example of a failing request: https://dev.planit.org.uk/api/applics/geojson?limit=501&bbox=&end_date=2020-03-05&start_date=2000-02-01&pg_sz=501&auth=Leeds&app_size=large&app_state=permitted

Robinlovelace commented 4 years ago

We could solve the problem R side but this suggests the resulting code will not be particularly pretty: https://httr.r-lib.org/articles/api-packages.html#pagination-handling-multi-page-responses

Robinlovelace commented 4 years ago

If you could bump up the maximum page size, e.g. to 1000 or 10,000 that could help our work.

aspeakman commented 4 years ago

There is an absolute maximum page size of 500. (all defined in the API documents - see https://dev.planit.org.uk/api/). This and the 'limit' are designed to protect against memory demands when requests are too big - as I have had some pretty crazy external request for large batches - and I think 10000 in one request would fall into that category

If you cant do paging in R then you would need to split up your date range - perhaps into 5 year chunks

Robinlovelace commented 4 years ago

Good response, makes sense, will see which solution works best.

mvl22 commented 4 years ago

If you could bump up the maximum page size, e.g. to 1000 or 10,000 that could help our work.

I would agree with Andrew that pagination is a sensible and reasonable requirement for the R interface to implement. Ultra-large responses in APIs of any kind like 10,000 results are inherently more likely to fail (busy servers, memory overload, HTTP transfer failures, etc.).

Robinlovelace commented 4 years ago

Thanks for the input, will see which solution works best.

mvl22 commented 4 years ago

do you know a query that will get all of the applications of interest, namely: "13/05235/FU@Leeds"

If you know an ID, you can use this API call format:

https://dev.planit.org.uk/planapplic/Leeds/13/05235/FU/geojson

Note that the area ("Leeds") and the ID ("13/05235/FU") in that URL format.

Robinlovelace commented 4 years ago

Already implemented. Give this reproducible example a try if you have R installed:

remotes::install_github("cyipt/acton")
#> Skipping install of 'acton' from a github remote, the SHA1 (732457a5) has not changed since last install.
#>   Use `force = TRUE` to force installation
library(acton)
res = acton::get_planit_data(auth = "leeds")
#> Getting data from https://dev.planit.org.uk/api/applics/geojson?limit=6&bbox=&end_date=2020-03-08&start_date=2000-02-01&pg_sz=6&auth=leeds
mapview::mapview(res)

Created on 2020-03-08 by the reprex package (v0.3.0)

I've opened a new issue because I realise this is not yet in the docs: https://github.com/cyipt/acton/issues/53

https://github.com/cyipt/acton/issues/53

Robinlovelace commented 4 years ago

Are there any other parts of the API that are not well documented here @mvl22 and @aspeakman ?: https://cyipt.github.io/acton/reference/get_planit_data.html

Here's a good place to say. Suggest we close this issue after the API arguments are well described in the docs and in the case study vignette: https://cyipt.github.io/acton/articles/case-studies.html

mvl22 commented 4 years ago

It's worth being aware that there are indeed two API calls, and that you are only using the area-based one, which essentially is an index of applications in an area, with the main details for each. The documentation for that looks pretty good to me.

However, the data returned will not give you the complete details, e.g. list of documents, for each such application, as that is more extensive. So the API call with the area and ID in the URL is essentially that secondary call. Doesn't look like that is needed for the ACTON project, but it's worth being aware that such an API call is available.

aspeakman commented 4 years ago

Yes just to expand on this point, the '/api/applics' endpoint will get you lists of applications that match different filtering critieria. However, by design, the lists only return the key summary information for each application. To get the full information you have to follow the 'link' field which leads you to the '/applic/' endpoint for that particular application.