Open zonca opened 1 year ago
Karl Kosack [karl.kosack@cea.fr](mailto:karl.kosack@cea.fr) and Maximilian Nöthe [maximilian.noethe@tu-dortmund.de](mailto:maximilian.noethe@tu-dortmund.de) are ctapipe developers (https://ctapipe.readthedocs.io/). The file format is called EventIO, documented here. They have a C++ library for it with a Python interface, but wanted to get less implementation-specific. They looked at Kaitai and had misgivings about the lack of built-in support for variable-width integers. I showed them that we can represent variable-width integers in Kaitai, but that was the last I heard from them.
SuperCDMS data file:
https://ncsa.osn.xsede.org/supercdms-data/CDMS/UMN/R68/Raw/07180808_1558/07180808_1558_F0001.mid.gz
@pibion will provide data descriptor
confirm I can access the data:
aws s3 --profile osn ls
s3://supercdms-data/CDMS/UMN/R68/Raw/07180808_1558/07180808_1558_F0001.mid.gz
2023-06-16 03:19:52 48189875 07180808_1558_F0001.mid.gz
@zonca here's the descriptor for v8 of the supercdms format: https://github.com/det-lab/dataReaderWriter/blob/master/kaitai/ksy/scdms_v8.ksy. An example of data that this matches is at https://github.com/det-lab/dataReaderWriter/blob/master/data/51230216_125838_F0001.mid.gz.
The UMN data above is v1, I'm working on a data description for that now.
@pibion do you have docs on how to create a Python interface for this descriptor? do these use awkward arrays?
@zonca yes, @manasvigoyal has added an awkward-array compiler to Kaitai-Struct, https://github.com/ManasviGoyal/kaitai_struct_compiler, https://github.com/ManasviGoyal/kaitai_awkward_runtime
Her talk on how to use these is at https://github.com/ManasviGoyal/PyHEP-2023-Awkward-Target-for-Kaitai-Struct
ok, I managed to install https://github.com/ManasviGoyal/kaitai_awkward_runtime, and run the example:
awkward_array = kaitai_awkward_runtime.load("data/animal.raw")
@ManasviGoyal how do I load https://github.com/det-lab/dataReaderWriter/blob/master/kaitai/ksy/scdms_v8.ksy or the dataset instead?
We want to investigate how to use data descriptors of SuperCDMS data to document their structure, this could be used by automated systems like EventIO to interface with the data