google-research / dex-lang

Research language for array processing in the Haskell/ML family
BSD 3-Clause "New" or "Revised" License
1.58k stars 107 forks source link

Add a way to (de)serialize Dex values #334

Open srush opened 3 years ago

srush commented 3 years ago

Trying out the dxbo as a way to get parameters into dex but it seems broken (or maybe just incomplete).

Any other ideas how to get parameters in? Load dxo works but is really slow.

dat : (Fin 2 => Float & Float) = ([10.0, 20.0], 10.0)
dump dxbo "examples/bin-tmp.dxbo" dat
> Compiler bug!
> Please report this at github.com/google-research/dex-lang/issues
>
> Prelude.undefined
> CallStack (from HasCallStack):
>   error, called at libraries/base/GHC/Err.hs:78:14 in base:GHC.Err
>   undefined, called at src/lib/Serialize.hs:238:16 in dex-0.1.0.0-31vc1WNSYI0CcnFyEWUMxN:Serialize
srush commented 3 years ago

Found the python exporter code that looked promising, but unfortunately that did not work either. Got a strange error when I ran the export.

load dxbo "examples/temp.dxbo" as mydat

q = mydat
> Error: variable not in scope: mydat
>
> q = mydat
>     ^^^
dougalm commented 3 years ago

Sorry, dxbo is broken and it probably won't be revived. Dex is starting to become expressive enough that we want to try doing data serialization/deserialization from within the language itself rather than having it built into the compiler. It's the same as the story with plotting. But we haven't done that yet and meanwhile the built-in stuff has bit-rotted. So we're in the familiar gap between deprecated and not ready.

Here's what we could try in the short term. On the soon-to-be-merged plotting branch we can pass pointers between Dex and C. We can use that to implement a function for reading files, String -> List Byte, with the OS interaction happening in C. (Of course, it's not actually a pure function because it uses IO but we'll punt on that for now.) Then we can make a simple protocol for storing a list of NumPy arrays. Would that be enough for your purposes?

srush commented 3 years ago

Gotcha, wasn't sure if I was just doing something wrong. I can be patient. Its neat to have a pure dex version.

Medium term it would be really nice to load records if possible? The common use case is exporting parameter trees from flax/pytorch, and converting them. If it is all tuples it gets pretty messy.

dougalm commented 3 years ago

Ah, good to know about records being helpful. What on-disk format do you usually use?

srush commented 3 years ago

We just use the standard pytorch one, which I think is pickle? https://pytorch.org/docs/stable/notes/serialization.html

Hdf5 is nice too.

But without dicts / strings there probably needs to be an intermediary layer. I don't mind that.