Closed Lulliter closed 4 years ago
@Lulliter - haven't been able to reproduce this issue. Looking at the first data set, I ran this code:
install.packages("RSocrata")
library(RSocrata)
d <- read.socrata("https://cohesiondata.ec.europa.eu/2014-2020-Finances/ESIF-2014-2020-Finance-Implementation-Details/99js-gm52")
str(d)
Which resulted in the following output:
'data.frame': 50792 obs. of 24 variables:
$ ms : chr "FR" "IT" "DE" "GR" ...
$ ms_name : chr "France" "Italy" "Germany" "Greece" ...
$ cci : chr "2014FR16M2OP011" "2014IT16M2OP001" "2014DE06RDRP012" "2014GR16M1OP001" ...
$ title : chr "Guyane - ERDF/ESF" "Molise - ERDF/ESF" "Niedersachsen / Bremen - Rural Development" "Transport Infrastructure Environment and Sustainable Development - GR - ERDF/CF" ...
$ ver : num 2 1.3 1.6 3 2.1 1.2 4 1.2 3 1.2 ...
$ fund : chr "ERDF" "ERDF" "EAFRD" "ERDF" ...
$ priority : chr "5" "5" "P4" "12" ...
$ to : chr "9" "6" "10" "6" ...
$ to_short : chr "Social Inclusion" "Environment Protection & Resource Efficiency" "Educational & Vocational Training" "Environment Protection & Resource Efficiency" ...
$ to_long : chr "Promoting social inclusion, combating poverty and any discrimination" "Preserving and protecting the environment and promoting resource efficiency" "Investing in education, training and vocational training for skills and lifelong learning" "Preserving and protecting the environment and promoting resource efficiency" ...
$ category_of_region : chr "Less developed" "Transition" "VOID" "Transition" ...
$ eafrd_measure : chr "VOID" "VOID" "M01" "VOID" ...
$ measure_short_description : chr "VOID" "VOID" "Knowledge" "VOID" ...
$ eafrd_fa : chr "VOID" "VOID" "FA_P4" "VOID" ...
$ focus_area_short_description: chr "VOID" "VOID" "Priority 4" "VOID" ...
$ eu_amount : num 10000000 6883565 38000000 4941487 1932500 ...
$ national_amount : num 1800000 6883565 9500000 1235372 644167 ...
$ total_amount : num 11800000 13767130 47500000 6176859 2576667 ...
$ year : int 2017 2015 2014 2016 2016 2015 2018 2015 2017 2015 ...
$ eu_co_financing : num 84.8 50 80 80 75 ...
$ total_eligible_cost : num 5700000 0 0 6444394 742873 ...
$ total_eligible_expenditure : num 0 0 0 0 742874 ...
$ ir_last_version : num 201801 201601 NA 201701 2016 ...
$ reference_date : POSIXct, format: "2017-12-31" "2015-12-31" ...
Comparing that with the metadata, it looks like the correct data types are being used. Can you provide an example of what you're seeing, the version of RSocrata being used, version of R, and operating system?
Hello, here are my session info:
> sessioninfo::platform_info()
version R version 3.6.1 (2019-07-05)
os macOS Mojave 10.14.6
RSocrata 1.7.10-6 2019-10-23 [1] CRAN (R 3.6.0)
I see what happens: If I run your same command I get the correct data types.
d <- read.socrata("https://cohesiondata.ec.europa.eu/2014-2020-Finances/ESIF-2014-2020-Finance-Implementation-Details/99js-gm52")
str(d)
'data.frame': 50797 obs. of 24 variables:
$ ms : chr "FR" "IT" "DE" "GR" ...
$ ms_name : chr "France" "Italy" "Germany" "Greece" ...
$ cci : chr "2014FR16M2OP011" "2014IT16M2OP001" "2014DE06RDRP012" "2014GR16M1OP001" ...
$ title : chr "Guyane - ERDF/ESF" "Molise - ERDF/ESF" "Niedersachsen / Bremen - Rural Development" "Transport Infrastructure Environment and Sustainable Development - GR - ERDF/CF" ...
$ ver : num 2 1.3 1.6 3 2.1 1.2 4 1.2 3 1.2 ...
$ fund : chr "ERDF" "ERDF" "EAFRD" "ERDF" ...
$ priority : chr "5" "5" "P4" "12" ...
$ to : chr "9" "6" "10" "6" ...
$ to_short : chr "Social Inclusion" "Environment Protection & Resource Efficiency" "Educational & Vocational Training" "Environment Protection & Resource Efficiency" ...
$ to_long : chr "Promoting social inclusion, combating poverty and any discrimination" "Preserving and protecting the environment and promoting resource efficiency" "Investing in education, training and vocational training for skills and lifelong learning" "Preserving and protecting the environment and promoting resource efficiency" ...
$ category_of_region : chr "Less developed" "Transition" "VOID" "Transition" ...
$ eafrd_measure : chr "VOID" "VOID" "M01" "VOID" ...
$ measure_short_description : chr "VOID" "VOID" "Knowledge" "VOID" ...
$ eafrd_fa : chr "VOID" "VOID" "FA_P4" "VOID" ...
$ focus_area_short_description: chr "VOID" "VOID" "Priority 4" "VOID" ...
$ eu_amount : num 10000000 6883565 38000000 4941487 1932500 ...
$ national_amount : num 1800000 6883565 9500000 1235372 644167 ...
$ total_amount : num 11800000 13767130 47500000 6176859 2576667 ...
$ year : int 2017 2015 2014 2016 2016 2015 2018 2015 2017 2015 ...
$ eu_co_financing : num 84.8 50 80 80 75 ...
$ total_eligible_cost : num 5700000 0 0 6444394 742873 ...
$ total_eligible_expenditure : num 0 0 0 0 742874 ...
$ ir_last_version : num 201801 201601 NA 201701 2016 ...
$ reference_date : POSIXct, format: "2017-12-31" "2015-12-31" NA "2016-12-31" ...
But I had used the link indicated as the API Endpoint (I did so, following the indication in the Code Snippets examples, so I figured that was the link to use)
d1 <- read.socrata( "https://cohesiondata.ec.europa.eu/resource/99js-gm52.json")
> str(d1 )
'data.frame': 50797 obs. of 24 variables:
$ ms : chr "FR" "IT" "DE" "GR" ...
$ ms_name : chr "France" "Italy" "Germany" "Greece" ...
$ cci : chr "2014FR16M2OP011" "2014IT16M2OP001" "2014DE06RDRP012" "2014GR16M1OP001" ...
$ title : chr "Guyane - ERDF/ESF" "Molise - ERDF/ESF" "Niedersachsen / Bremen - Rural Development" "Transport Infrastructure Environment and Sustainable Development - GR - ERDF/CF" ...
$ ver : chr "2.0" "1.3" "1.6" "3.0" ...
$ fund : chr "ERDF" "ERDF" "EAFRD" "ERDF" ...
$ priority : chr "5" "5" "P4" "12" ...
$ to : chr "9" "6" "10" "6" ...
$ to_short : chr "Social Inclusion" "Environment Protection & Resource Efficiency" "Educational & Vocational Training" "Environment Protection & Resource Efficiency" ...
$ to_long : chr "Promoting social inclusion, combating poverty and any discrimination" "Preserving and protecting the environment and promoting resource efficiency" "Investing in education, training and vocational training for skills and lifelong learning" "Preserving and protecting the environment and promoting resource efficiency" ...
$ category_of_region : chr "Less developed" "Transition" "VOID" "Transition" ...
$ eafrd_measure : chr "VOID" "VOID" "M01" "VOID" ...
$ measure_short_description : chr "VOID" "VOID" "Knowledge" "VOID" ...
$ eafrd_fa : chr "VOID" "VOID" "FA_P4" "VOID" ...
$ focus_area_short_description: chr "VOID" "VOID" "Priority 4" "VOID" ...
$ eu_amount : chr "10000000" "6883565" "38000000" "4941487" ...
$ national_amount : chr "1800000" "6883565" "9500000" "1235372" ...
$ total_amount : chr "11800000" "13767130" "47500000" "6176859" ...
$ year : chr "2017" "2015" "2014" "2016" ...
$ eu_co_financing : chr "84.75" "50" "80" "80" ...
$ total_eligible_cost : chr "5700000" "0" "0" "6444394.36" ...
$ total_eligible_expenditure : chr "0" "0" "0" "0" ...
$ ir_last_version : chr "201801.0" "201601.0" NA "201701.0" ...
$ reference_date : POSIXct, format: "2017-12-31" "2015-12-31" NA "2016-12-31" ...
Thanks!
Makes sense. When you don't specify the data type (e.g., just copy/paste the URL from address bar), RSocrata defaults to download from CSV. But, you can provide a valid API endpoint to declare a specific file type. Socrata is optimized for speed with JSON and is considerably faster so this is sometimes necessary. However, Socrata places quotes around all values regardless of original data type so R will interpret numbers as characters. Likewise, Socrata doesn't provide an API to read field-level metadata so we can't correct this on the back-end.
Will close this issue for now. We have some open issues on better handling JSON files that we can tackle if Socrata releases some improvements on metadata. Do reach out if you have any other issues.
OK! Many thanks for the explanation.
I am working in R with EU cohesion data (specifically datasets 99js-gm52, xns4-t7ym, 3kkx-ekfq )
The
read.socrata()
functions returns numeric fields aschr
in the dataframe. Is it possible to convert them into the correct field type?