JuliaData / JuliaDB.jl

Parallel analytical database in pure Julia
http://juliadb.org/
Other
768 stars 62 forks source link

Cannot seem to load a very small file #301

Open xiaodaigh opened 5 years ago

xiaodaigh commented 5 years ago

Included a runnable MWE. The file is less than 1mb but just seems to hang in the terminal in Julia 1.2.0 Windows 10, but is working fine on Julia 1.1.1

using JuliaDB, Dagger

##############################################################
# Download & Extract data
###############################################################

#;wget https://raw.githubusercontent.com/xiaodaigh/JuliaDB.jl/master/ok.csv

##############################################################
# Specify the types of columns
###############################################################

fmtypes = [
    Int64,                     String,     Union{String, Missing},     Union{Float64, Missing},    Union{Float64, Missing},
    Union{Float64, Missing},    Union{Float64, Missing},    Union{Float64, Missing},    Union{String, Missing},     Union{String, Missing},
    Union{String, Missing},     Union{String, Missing},     Union{String, Missing},     Union{String, Missing},     Union{String, Missing},
    Union{String, Missing},     Union{String, Missing},     Union{Float64, Missing},    Union{Float64, Missing},    Union{Float64, Missing},
    Union{Float64, Missing},    Union{Float64, Missing},    Union{Float64, Missing},    Union{Float64, Missing},    Union{Float64, Missing},
    Union{Float64, Missing},    Union{Float64, Missing},    Union{Float64, Missing},    Union{String, Missing},     Union{Float64, Missing},
    Union{String, Missing}]

@time jll = loadtable(
    "ok.csv",
    output = "fm.jldb/",
    delim=',',
    header_exists=true,
    #filenamecol = "filename",
    #chunks = length(ifiles),
    #type_detect_rows = 20_000,
    # colnames = colnames,
    colparsers = fmtypes,
    indexcols=["Column1"]);
jpsamaroo commented 4 years ago

I can reproduce this, and confirm that with a non-release Julia v1.3 build on Linux it hangs and ignores attempts to Ctrl-C.

davidanthoff commented 4 years ago

Could you try to read it with just TextParse.jl? Just to figure out whether the problem is there, or in JuliaDB.

jpsamaroo commented 4 years ago

Doing just TextParse.csvread("ok.csv", ','; header_exists=true, colparsers=fmtypes) loads the file successfully in ~5 seconds (including inference and compilation time, which is quite good). So clearly this is a JuliaDB issue. Thanks for the tip @davidanthoff !