AlgebraicJulia / ACSets.jl

ACSets: Algebraic databases as in-memory data structures
https://algebraicjulia.github.io/ACSets.jl/
MIT License
18 stars 7 forks source link

Optimizing Serializers for ACSets #12

Open jpfairbanks opened 3 years ago

jpfairbanks commented 3 years ago

We currently have JSON serializers for ACSets AlgebraicJulia/Catlab.jl#265, but when performance gets critical we are going to want to optimize these and the construction of the indexing structures. The way this works for Graphs is

  1. read an edge list off disk
  2. sort and count to get all the neighborhood sizes,
  3. allocate all the indexes exactly
  4. fill all the data with bulk copies

if constructor performance because a critical application bottleneck, we could design a serializer/deserializer that took a schema and generated the optimal binary format for it, along with a serializer/deserializer for that format.

epatters commented 3 years ago

We might also consider integrating with something like Apache Arrow rather than writing our own binary format.

jpfairbanks commented 3 years ago

Yeah, the Arrow.jl package seems to be getting reasonable maintenance, and have the kinds of types we would want for ACSets to interop with RDBMSes.