Open utterances-bot opened 1 year ago
Thanks for this helpful post, François. I did a csv to parquet conversion of a huge csv file (56gb) but the open_dataset() function of the R Arrow library gave me some weird issues with the imported csv. I submitted a bug report but due to the large size of the csv (eBird full dataset) it's difficult to share the file and make the "bug" fully reproducible. I tried awk to double check the csv and the Python Arrow library to import the csv into parquet and they both worked well... only the R library gave me the weird rows. Have you guys in Voltron heard of similar issues? Here is my bug report: https://issues.apache.org/jira/projects/ARROW/issues/ARROW-17432?filter=allopenissues Thanks!
Hi @GuiAlDuS, I added a comment on your Jira issue. See if there is something you can do about keeping your identifiers as integers instead of doubles.
Hi Fracois, I'm getting this error message in R console when I tried to download t ## download the data can you hint if it works or not?
walk(dates_to_get, download_daily_package_logs_csv)
walk(dates_to_get, download_daily_package_logs_csv) Downloading data for 2022-06-01 ... Error in
map()
: i In index: 1. Caused by error indownload.file()
: ! cannot open URL 'https://cran-logs.rstudio.com/2022/2022-06-01.csv.gz' Runrlang::last_error()
to see where the error occurred. Warning message: In download.file(url = url, destfile = file, method = "libcurl", : URL 'https://cran-logs.rstudio.com/2022/2022-06-01.csv.gz': status was 'Couldn't connect to server'
Creating an Arrow dataset | François Michonneau
An exploration of the file formats that Arrow can read and write.
https://francoismichonneau.net/2022/08/arrow-dataset-creation/