det-lab / kaitai_struct_awkward_runtime

Kaitai Struct: runtime for Awkward Arrays
MIT License
2 stars 2 forks source link

feat: add `IndexedBuilder` for `EnumType` cases #14

Closed ManasviGoyal closed 3 months ago

ManasviGoyal commented 9 months ago

Currently we use NumpyBuilder for EnumType cases. We eventually want to use IndexedBuilder to store the associated strings in enum. This requires adding an IndexedBuilder in LayoutBuilder.h in awkward.

zonca commented 9 months ago

Line 1463 in https://github.com/ManasviGoyal/kaitai_struct_compiler/blob/1a3cd3b5b5f85c612e5eaa888aed3528b17ed853/shared/src/main/scala/io/kaitai/struct/languages/AwkwardCompiler.scala has enum switch case for pointer and for instances

Test of numpybuilder https://github.com/scikit-hep/awkward/blob/main/header-only/tests/test_1494-layout-builder.cpp

Layout guilder is in https://github.com/scikit-hep/awkward/blob/main/header-only/layout-builder/awkward/LayoutBuilder.h

Can duplicate and make changes to IndexedOptionBuilder just keep it generic and not accept None

zonca commented 9 months ago

@ManasviGoyal I started working on this, not knowing yet what I am doing. can you take a look at this and provide some feedback? https://github.com/zonca/awkward/pull/3

zonca commented 9 months ago

@jpivarski @ManasviGoyal @pibion I have a first implementation of the Indexed layout builder class and 1 unit test copied from test_Indexed_as_IndexedOption.

See https://github.com/zonca/awkward/pull/3

is this enough for a first PR to the main awkward repo?

zonca commented 9 months ago

@ManasviGoyal does any of the test files in #21 have an EnumType?

ManasviGoyal commented 9 months ago

@ManasviGoyal does any of the test files in #21 have an EnumType?

Yes, scdms_v8.ksy and scdms.ksy have EnumType

zonca commented 9 months ago

ok, first I'll implement a test for those 2 data files.

Notes to myself: then I'll keep modifying AwkwardCompiler.scala, no need to worry about isIndexedOption, we need to set categorical, can duplicate a similar test to https://github.com/scikit-hep/awkward/blob/main/header-only/tests/test_1494-layout-builder.cpp#L1837, instead of NumpyBuilder use a ListOffset of NumpyBuilder of uint8, , check StrType that does the same.

zonca commented 8 months ago

currently inside scdms_v8.ksy we have:

> awkward_array.scdms_v8A__Zodb_hdr.type
ArrayType(NumpyType('uint16'), 1, None)
zonca commented 8 months ago

Example of field that we want to modify in scdms.ksy, overall_header is a NumpyType, we want it to be a IndexOption with key/values:

In [17]: awkward_array.scdmsA__Zscdms_hdr.scdms_headerA__Zoverall_header
Out[17]: <Array [9] type='1 * int32'>

In [18]: awkward_array.scdmsA__Zscdms_hdr.scdms_headerA__Zoverall_header.type
Out[18]: ArrayType(NumpyType('int32'), 1, None)
zonca commented 8 months ago

implement enum content handling in attrParse2 in AwkwardCompiler.scala

I need to understand the content of all variables in that section of the code. The best way is to try configure the sbt package step of the kaitai struct compiler into VS Code and see if I can run in the debugger and check the content of all the variables.

zonca commented 7 months ago

I configured VS Code to use the live debugger inside scala, so I can inspect all variables. Started the implementation, see https://github.com/ManasviGoyal/kaitai_struct_compiler/pull/3

zonca commented 3 months ago

Implemented in https://github.com/det-lab/kaitai_struct_awkward_runtime/pull/39