codedthinking / Kezdi.jl

Julia package for data manipulation and analysis
https://codedthinking.github.io/Kezdi.jl/
Other
21 stars 0 forks source link

ReadStatTables.jl cannot handle non coalesced StrL dta format #104

Open andrasvereckei opened 4 days ago

andrasvereckei commented 4 days ago

Original .dta input read with:

df = @use "raw.dta"
or 
df = readstat("raw.dta")

have the same memory error:

Unable to allocate memory

Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] _error
   @ ~/.julia/packages/ReadStatTables/PtHGB/src/parser.jl:233 [inlined]
 [3] _parse_allmeta(filepath::String, ext::String, parse_ext::typeof(ReadStatTables.parse_dta), usecols::Nothing, file_encoding::Nothing, handler_encoding::Nothing)
   @ ReadStatTables ~/.julia/packages/ReadStatTables/PtHGB/src/parser.jl:286
 [4] readstat(filepath::String; ext::String, usecols::Nothing, row_limit::Nothing, row_offset::Int64, ntasks::Nothing, apply_value_labels::Bool, inlinestring_width::Int64, pool_width::Int64, pool_thres::Int64, file_encoding::Nothing, handler_encoding::Nothing)
   @ ReadStatTables ~/.julia/packages/ReadStatTables/PtHGB/src/readstat.jl:92
 [5] readstat(filepath::String)
   @ ReadStatTables ~/.julia/packages/ReadStatTables/PtHGB/src/readstat.jl:49
 [6] use(fname::String)
   @ Kezdi ~/.julia/packages/Kezdi/u2tLs/src/commands.jl:1
 [7] top-level scope
   @ /srv/project/balance/julia/Stata_test_memory.ipynb:1

Repo: https://github.com/junyuan-chen/ReadStatTables.jl Maybe error in the parser.jl

After compress command changes strL to coalesced by the help of Stata both read option is working.

https://www.stata.com/manuals/dcompress.pdf