edwindj / cbsodataR

Statistics Netherlands (CBS) OpenData API Client for R
https://edwindj.github.io/cbsodataR
33 stars 12 forks source link

filter data with geq, neq and not in #24

Open NielsBosNL opened 5 years ago

NielsBosNL commented 5 years ago

Hi Edwin, very nice package. In your examples you show how to filter on a specific variable-value: _cbs_getdata(id="03759ned", Perioden=c("2013JJ00","2014JJ00"), Geslacht="T001038")

however, in this large file of 45 million records I want to filter to e.g. Perioden > "1990JJ00", Geslacht != "T001038", ! Leeftijd %in% c(10000, 60100,60200,60300,60400,60500,60600,60700,60800,60900,21900) Is that possible? What would be the correct syntax for "not equal", "greater then" or "not in"?

Or filter substr(RegioS,1,2)="GM" filtering just municipalities :-)

edwindj commented 4 years ago

This syntax is currently not supported, sorry!

edwindj commented 4 years ago

Only a small subset is supported: has_substring detects for substrings.

lverweijen commented 1 year ago

I tried performing your query using ODataQuery (I'm the author).

It worked on http://beta-odata4.cbs.nl/

library(ODataQuery)

leeftijden <- c(
  "10000", "60100", "60200", "60300", "60400",
  "60500", "60600", "60700", "60800", "60900",
  "21900")

opendata_service <- ODataQuery$new("http://beta-odata4.cbs.nl/")
observations_path <- opendata_service$path('CBS', '03759ned', "Observations")
observations_query <-
  observations_path$filter(to_odata(Perioden > "1990JJ00"
                                    && Geslacht != "T001038"
                                    && Leeftijd %in% !!leeftijden))

print(observations_query$url)  
observations_df <- observations_query$all()
head(observations_df)

http://beta-odata4.cbs.nl/CBS/03759ned/Observations?$filter=(Perioden%20gt%20'1990JJ00'%20and%20Geslacht%20ne%20'T001038'%20and%20Leeftijd%20in%20('10000','60100','60200','60300','60400','60500','60600','60700','60800','60900','21900'))

Id Measure ValueAttribute   Value StringValue BurgerlijkeStaat Geslacht Leeftijd RegioS Perioden
1 30690548 M000352           None 7419501          NA          T001019     3000    10000   NL01 1991JJ00
2 30690549 M000352           None 7480422          NA          T001019     3000    10000   NL01 1992JJ00
3 30690550 M000352           None 7535268          NA          T001019     3000    10000   NL01 1993JJ00
4 30690551 M000352           None 7585887          NA          T001019     3000    10000   NL01 1994JJ00
5 30690552 M000352           None 7627482          NA          T001019     3000    10000   NL01 1995JJ00
6 30690553 M000365           None 7644886          NA          T001019     3000    10000   NL01 1995JJ00

Unfortunately, it didn't work on the stable ODataService:

http://opendata.cbs.nl/ODataApi/odata/03759ned/TypedDataSet?$filter=(Perioden%20gt%20'1990JJ00'%20and%20Geslacht%20ne%20'T001038'%20and%20Leeftijd%20in%20('10000','60100','60200','60300','60400','60500','60600','60700','60800','60900','21900')) Error getting TypedDataSet for '03759ned': Object reference not set to an instance of an object.