JuliaData / Strapping.jl

Tools for mapping between Julia structs and 2D tabular data.
https://juliadata.github.io/Strapping.jl/stable
MIT License
57 stars 5 forks source link

nonempty array assumption #12

Closed ericphanson closed 3 years ago

ericphanson commented 3 years ago
julia> using Strapping, StructTypes

julia> StructTypes.StructType(::Type{AnalyzeRegistry.Package}) = StructTypes.Struct()

julia> DataFrame(Strapping.deconstruct(results))
ERROR: BoundsError: attempt to access 0-element Vector{String} at index [1]
Stacktrace:
 [1] getindex
   @ ./array.jl:801 [inlined]
 [2] nametypeindex!
   @ ~/.julia/packages/Strapping/kHXFJ/src/Strapping.jl:483 [inlined]
 [3] nametypeindex!
   @ ~/.julia/packages/Strapping/kHXFJ/src/Strapping.jl:461 [inlined]
 [4] #_#58
   @ ~/.julia/packages/Strapping/kHXFJ/src/Strapping.jl:403 [inlined]
 [5] DeconstructClosure
   @ ~/.julia/packages/Strapping/kHXFJ/src/Strapping.jl:399 [inlined]
 [6] foreachfield
   @ ~/.julia/packages/StructTypes/eCxx6/src/StructTypes.jl:604 [inlined]
 [7] deconstruct(values::Vector{AnalyzeRegistry.Package})
   @ Strapping ~/.julia/packages/Strapping/kHXFJ/src/Strapping.jl:412
 [8] top-level scope
   @ REPL[94]:1

Here, results is a Vector{AnalyzeRegistry.Package} with an entry for every package in General, ie this is the opposite of a MWE, sorry. But a small example (which may not trigger the bug) is

using AnalyzeRegistry
results = analyze_from_registry(find_packages("DataFrames", "Flux"))

Opening the code it looks like Strapping assumes ArrayTypes are nonempty in

function nametypeindex!(::StructTypes.ArrayType, x::T, i, nm, c) where {T}
    nametypeindex!(x[1], i, nm, c)
end

The context here is https://github.com/giordano/AnalyzeRegistry.jl where we have a Package type to describe information about a Julia package, and we store some pieces information as Vector{<:NamedTuple} row tables, and some of these tables have nested information (e.g. a table with one row per file, with a licenses_found column which is a vector of software licenses found within that file). In Slack, we discussed unnesting these tables, but it could be useful to use Strapping to deconstruct the entire Package into a row in a big table for further analysis.

quinnj commented 3 years ago

Thanks for opening an issue; I'm actively digging in pretty deep on this code, so this is a helpful use-case to know about. Unfortunately the fix isn't super easy, but I'm going to try and dig in and get it done pretty soon because I'd like to get a new release out.

quinnj commented 3 years ago

Alright, as expected, it's a doozy of a PR, but it turns out the code had more than just this bad assumption, so it all needed a good pass through the refiner's fire to get it more robust. https://github.com/JuliaData/Strapping.jl/pull/13

ericphanson commented 3 years ago

Cool!

I think I may have run into some of those other issues a month or so ago but didn't end up filing an issue since I didn't know if I was using it right. My use-case then was to serialize simple Julia objects (like structs containing numeric matrices and vectors, and structs containing such structs) for later retrieval in Julia; i.e. basically like JLD2 except I hoped having some control over the serialization via StructTypes would help with robustness and give me the possibility to later deserialize in a very different context if I needed to.

More recently I've been thinking about Strapping as possibly useful for constructing tabular "views" of nested data (within a Julia process, not for serialization); i.e. perhaps the most natural way to read in or construct data is in a nested object, but then it can be easier to analyze that data in a table with groupby's and such. [I put "views" in quotes because it's not always essential that there's no copying involved; sometimes you just want a table].