Addprocs ()
Using JuliaDB
Path = "F:/BD/labo/labo/siren.csv"
sirene = loadtable(path)
And I have mistakes.
First, I thought the file was too badly built to be imported via loadtable:
The encoding was in WIN-1252
The strings were sometimes contained inside quote, sometimes was not
The separator was ";" and no ","
The separator could be contained in quoted chains
Maybe the file was too big (...?)
Maybe the successive separators linked to a missing field might have been misinterpreted, so I replaced ",," with ", NULL,"
Maybe the values of the non-answers were badly recognized, especially in the numerical fields, I replaced in the numerical variables ", NR, by", NULL, "
So I applied a set of transformations to the initial file using Perl + Iconv regular expressions. I then extracted a small file of 2,500 lines first lines.
I did not notice a major flaw when considering this excel extract, and in particular the number of fields in each line is the same and equal to 100.
With
sirene=loadtable(path)
julia> sirene=loadtable(chemin)
Error parsing F:\BD\labo\labo\test.csv
ERROR: On worker 2:
previous rows had 98 fields but row 2 has 100
guesscolparsers at C:\Users\jerom.julia\v0.6\TextParse\src\csv.jl:507
_csvread_internal#35 at C:\Users\jerom.julia\v0.6\TextParse\src\csv.jl:194
_csvread_internal at .\:0
32 at C:\Users\jerom.julia\v0.6\TextParse\src\csv.jl:92
open at .\iostream.jl:152
_csvread_f at .\:0
csvread#34 at C:\Users\jerom.julia\v0.6\TextParse\src\csv.jl:103
csvread at .\:0
_loadtable_serial#2 at C:\Users\jerom.julia\v0.6\JuliaDB\src\util.jl:88
_loadtable_serial at .\:0
217 at C:\Users\jerom.julia\v0.6\JuliaDB\src\io.jl:131
do_task at C:\Users\jerom.julia\v0.6\Dagger\src\compute.jl:319
106 at .\distributed\process_messages.jl:268 [inlined]
run_work_thunk at .\distributed\process_messages.jl:56
macro expansion at .\distributed\process_messages.jl:268 [inlined]
105 at .\event.jl:73
With sirene=loadtable(path,type_detect_rows=2500)
julia> sirene=loadtable(path,type_detect_rows=2500)
Error parsing F:\BD\labo\labo\test.csv
ERROR: On worker 2:
previous rows had 98 fields but row 2 has 100
guesscolparsers at C:\Users\jerom.julia\v0.6\TextParse\src\csv.jl:507
_csvread_internal#35 at C:\Users\jerom.julia\v0.6\TextParse\src\csv.jl:194
_csvread_internal at .\:0
32 at C:\Users\jerom.julia\v0.6\TextParse\src\csv.jl:92
open at .\iostream.jl:152
_csvread_f at .\:0
csvread#34 at C:\Users\jerom.julia\v0.6\TextParse\src\csv.jl:103
csvread at .\:0
_loadtable_serial#2 at C:\Users\jerom.julia\v0.6\JuliaDB\src\util.jl:88
_loadtable_serial at .\:0
217 at C:\Users\jerom.julia\v0.6\JuliaDB\src\io.jl:131
do_task at C:\Users\jerom.julia\v0.6\Dagger\src\compute.jl:319
106 at .\distributed\process_messages.jl:268 [inlined]
run_work_thunk at .\distributed\process_messages.jl:56
macro expansion at .\distributed\process_messages.jl:268 [inlined]
105 at .\event.jl:73
Do you have any idea of how to correctly load this file? Am I doing it wrong?
I chose JuliaDB because of the size of the file to load (~ 8 GB / 11 000 000 lines and 100 variables)
Hello, I am beginner in Julia. I try to import the big file of the french establishments (file opendata "sirene": http://files.data.gouv.fr/sirene/sirene_201711_L_M.zip).
I used this code or derivatives codes
Addprocs () Using JuliaDB Path = "F:/BD/labo/labo/siren.csv" sirene = loadtable(path)
And I have mistakes. First, I thought the file was too badly built to be imported via loadtable: The encoding was in WIN-1252 The strings were sometimes contained inside quote, sometimes was not The separator was ";" and no "," The separator could be contained in quoted chains Maybe the file was too big (...?) Maybe the successive separators linked to a missing field might have been misinterpreted, so I replaced ",," with ", NULL," Maybe the values of the non-answers were badly recognized, especially in the numerical fields, I replaced in the numerical variables ", NR, by", NULL, "
So I applied a set of transformations to the initial file using Perl + Iconv regular expressions. I then extracted a small file of 2,500 lines first lines.
This extract can be donwload here : https://www.justbeamit.com/u42je
I did not notice a major flaw when considering this excel extract, and in particular the number of fields in each line is the same and equal to 100.
With sirene=loadtable(path)
julia> sirene=loadtable(chemin) Error parsing F:\BD\labo\labo\test.csv ERROR: On worker 2: previous rows had 98 fields but row 2 has 100 guesscolparsers at C:\Users\jerom.julia\v0.6\TextParse\src\csv.jl:507
_csvread_internal#35 at C:\Users\jerom.julia\v0.6\TextParse\src\csv.jl:194
_csvread_internal at .\:0
32 at C:\Users\jerom.julia\v0.6\TextParse\src\csv.jl:92
open at .\iostream.jl:152
_csvread_f at .\:0
csvread#34 at C:\Users\jerom.julia\v0.6\TextParse\src\csv.jl:103
csvread at .\:0
_loadtable_serial#2 at C:\Users\jerom.julia\v0.6\JuliaDB\src\util.jl:88
_loadtable_serial at .\:0
217 at C:\Users\jerom.julia\v0.6\JuliaDB\src\io.jl:131
do_task at C:\Users\jerom.julia\v0.6\Dagger\src\compute.jl:319
106 at .\distributed\process_messages.jl:268 [inlined]
run_work_thunk at .\distributed\process_messages.jl:56 macro expansion at .\distributed\process_messages.jl:268 [inlined]
105 at .\event.jl:73
With sirene=loadtable(path,type_detect_rows=2500)
julia> sirene=loadtable(path,type_detect_rows=2500) Error parsing F:\BD\labo\labo\test.csv ERROR: On worker 2: previous rows had 98 fields but row 2 has 100 guesscolparsers at C:\Users\jerom.julia\v0.6\TextParse\src\csv.jl:507
_csvread_internal#35 at C:\Users\jerom.julia\v0.6\TextParse\src\csv.jl:194
_csvread_internal at .\:0
32 at C:\Users\jerom.julia\v0.6\TextParse\src\csv.jl:92
open at .\iostream.jl:152
_csvread_f at .\:0
csvread#34 at C:\Users\jerom.julia\v0.6\TextParse\src\csv.jl:103
csvread at .\:0
_loadtable_serial#2 at C:\Users\jerom.julia\v0.6\JuliaDB\src\util.jl:88
_loadtable_serial at .\:0
217 at C:\Users\jerom.julia\v0.6\JuliaDB\src\io.jl:131
do_task at C:\Users\jerom.julia\v0.6\Dagger\src\compute.jl:319
106 at .\distributed\process_messages.jl:268 [inlined]
run_work_thunk at .\distributed\process_messages.jl:56 macro expansion at .\distributed\process_messages.jl:268 [inlined]
105 at .\event.jl:73
Do you have any idea of how to correctly load this file? Am I doing it wrong? I chose JuliaDB because of the size of the file to load (~ 8 GB / 11 000 000 lines and 100 variables)
Best regards