coderLMN / AutomatedDataCollectionWithR

《基于 R 语言的自动化数据采集技术》读者讨论区
28 stars 10 forks source link

P59 R读json报错 #11

Closed lixinyao closed 7 years ago

lixinyao commented 7 years ago
> indy = fromJSON(content = "indy.json")
Error in nchar(content) : invalid multibyte string, element 1

是locale的问题吗?

> sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.12.1 (unknown)

locale:
[1] zh_CN.UTF-8/zh_CN.UTF-8/zh_CN.UTF-8/C/zh_CN.UTF-8/zh_CN.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
 [1] RJSONIO_1.3-0    RMySQL_0.10.9    DBI_0.4-1       
 [4] XML_3.98-1.4     RCurl_1.95-4.8   bitops_1.0-6    
 [7] dplyr_0.5.0.9000 stringr_1.1.0    rvest_0.3.2     
[10] xml2_1.0.0      

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.6         digest_0.6.9        assertthat_0.1     
 [4] R6_2.1.2            magrittr_1.5        evaluate_0.9       
 [7] httr_1.2.0          stringi_1.1.2       lazyeval_0.2.0.9000
[10] curl_0.9.7          rmarkdown_1.0.9014  tools_3.3.0        
[13] selectr_0.3-0       htmltools_0.3.5     tibble_1.1   
coderLMN commented 7 years ago

严格说起来不是 locale 问题,而是字符编码 encoding 的问题,你可以参考这个问答: http://stackoverflow.com/a/24619808/1400279 ,里面的解决方案是

x <- fromJSON('["Z\\u00FCrich"]')
print(x)
# [1] "Z\xfcrich"

nchar(x)
#Error in nchar(x) : invalid multibyte string 1

#Set the correct encoding
Encoding(x) <- "latin1"
print(x)
#[1] "Zürich" 
lixinyao commented 7 years ago

找到原因了吴老师。 我用Chrome将http://www.r-datacollection.com/materials/ch-3-xml/indy.json另存为json,会有一个乱码"Dr. Ren� Belloq": "Paul Freeman",而原文件应该是"Dr. René Belloq": "Paul Freeman" 这个还是没想到,很多时候大家都是将文件另存为的,在内容较多的情况下复制文件内容也不太方便~ 现在用RJSONIOrjsonjsonlite读取都没问题了~

gaodianzhuo commented 7 years ago

代码前设置一下R的环境 Sys.setlocale("LC_ALL", "English")