Open lgnbhl opened 6 years ago
I get the same error when trying to read this other file from Swiss Federal Statistical Office (or BFS): https://www.pxweb.bfs.admin.ch/DownloadFile.aspx?file=px-x-0702000000_104
Hello Martin Zbinden,
I made a fork of the pxR package in order to make it compatible with the Swiss Federal Statistical Office (or BFS). My fork is just the result of my Pull Request.
Just try this code:
library(devtools)
install_github("lgnbhl/pxR", force = TRUE) # fork making pxR compatible with BFS
library(pxR)
url <- "https://www.pxweb.bfs.admin.ch/DownloadFile.aspx?file=px-x-0702000000_104"
dataset <- pxR::read.px(url)`
Let me know if it works :-)
I have the same problem with bfs.admin.ch files. In my case it's "......" (six dots) which creates the problem. This would be fixed with including "....." and "......" as na.strings. I've submitted a pull request.
Hi @lgnbhl, I just came across your fork but it still does not work with this BFS data:
pxR::read.px("https://www.pxweb.bfs.admin.ch/DownloadFile.aspx?file=px-x-1503040100_101")
# Warning in scan(filename, what = "character", sep = "\n", quiet = TRUE, :
# invalid input found on input connection 'https://www.pxweb.bfs.admin.ch/DownloadFile.aspx?file=px-x-1503040100_101'
# Error in pxR::read.px("https://www.pxweb.bfs.admin.ch/DownloadFile.aspx?file=px-x-1503040100_101") :
# The input file is malformed: data and varnames length differ
Any clues why this is happening? Sorry to address you, I'm not sure how/where to file this.
Cheers
PS: Ref.: https://www.bfs.admin.ch/bfs/de/home/statistiken/bildung-wissenschaft/bildungsabschluesse/tertiaerstufe-hochschulen/universitaere.assetdetail.13147037.html in case I got the link wrong, but I also tried on the downloaded data with the same warning/error
Hi @jaySf,,
My guess is that pxR::read.px() fails to read PX files from BFS with Windows. Sometimes the function works fine with Mac and Linux but not always... I don't fully understand why and I didn't find a quick fix for it. I will remove my old fork as it doesn't solve this issue.
Note also that I have the same issue that you have using pxR::read.px() in my R package which help to automate the extraction of data from the BFS: https://github.com/lgnbhl/BFS/issues/3.
@lgnbhl Thanks for your fast reply! Really strange, perhaps I try it on my linux machine later. Great, didn't know there was a BFS package! Too sad the issue with pxR::read.px()
Hi there, I've been successful reading in px-files in Windows from BFS if I prepare them a little before reading them in:
#Read in file an convert encoding
x <- iconv(readLines(paste(folder, file, sep="/"), encoding="CP1252 "), from="CP1252 ", to="Latin1", sub="")
#Replace missings to workaround a bug in pxR.
x <- gsub("\"......\"", "\"....\"", x, fixed = TRUE)
x <- gsub("\".....\"", "\"....\"", x, fixed = TRUE)
#Write the file with the changes
fileConn<-file(paste(folder, file, sep="/"))
writeLines(x, con=fileConn, useBytes = TRUE)
close(fileConn)
Depending on the size of the px-File this takes a while.
It seems that pxR has a problem with "......". Hope this helps.
Hi @statzg ,
Thank you very much for sharing your fix! I will implement it in my BFS package.
Hi @statzg, I have been using your trick and it worked well, but it seems that it didn't work anymore when I tried with some other data from the BFS, and then it didn't work with older codes that used to work. I don't know to what it is due, but I got this message:
file<-"px-x-0702000000_102_copy.px"
x <- iconv(readLines(paste(pt, file, sep="/"), encoding="CP1252 "), from="CP1252 ", to="Latin1", sub="")
x <- gsub("\"......\"", "\"....\"", x, fixed = TRUE)
x <- gsub("\".....\"", "\"....\"", x, fixed = TRUE)
fileConn<-file(paste(pt, file, sep="/"))
writeLines(x, con=fileConn, useBytes = TRUE)
close(fileConn)
data = read.px(paste(pt,file,sep="/"), na.strings = c('"."','".."','"..."','"...."','"....."','"....."','":"'))
#Error in stri_length(string) :
#invalid UTF-8 byte sequence detected; try calling stri_enc_toutf8()
I found that converting from UTF-8 to latin1 did the trick though, so if anyone experiences the same issue, here's what worked for me:
file<-"px-x-0702000000_102.px"
x <- iconv(readLines(paste(pt, file, sep="/"), encoding="UTF-8"), from="UTF-8", to="Latin1", sub="")
x <- gsub("\"......\"", "\"....\"", x, fixed = TRUE)
x <- gsub("\".....\"", "\"....\"", x, fixed = TRUE)
#Write the file with the changes
fileConn<-file(paste(pt, file, sep="/"))
writeLines(x, con=fileConn, useBytes = TRUE)
close(fileConn)
data = read.px(paste(pt,file,sep="/"), na.strings = c('"."','".."','"..."','"...."','"....."','"....."','":"'))
Thanks again! Best
Firstly, thank you for this very useful package!
I got an error when using
pxR::read.px
in order to read some PX files from the Swiss Federal Statistical Office (or BFS) online database (https://www.pxweb.bfs.admin.ch/).I presume that the error comes from a missing na.strings from the
pxR::read.px
function:"....."
(5 dots)Would it be possible to fix this problem? Many thanks in advance!
Example