det-lab / supercdms-project

Repository to hold issues about collaboration with @zonca
0 stars 0 forks source link

CDMS data on OSN and descriptors #1

Open zonca opened 1 year ago

zonca commented 1 year ago

We want to investigate how to use data descriptors of SuperCDMS data to document their structure, this could be used by automated systems like EventIO to interface with the data

zonca commented 1 year ago

Event IO

Karl Kosack [karl.kosack@cea.fr](mailto:karl.kosack@cea.fr) and Maximilian Nöthe [maximilian.noethe@tu-dortmund.de](mailto:maximilian.noethe@tu-dortmund.de) are ctapipe developers (https://ctapipe.readthedocs.io/). The file format is called EventIO, documented here. They have a C++ library for it with a Python interface, but wanted to get less implementation-specific. They looked at Kaitai and had misgivings about the lack of built-in support for variable-width integers. I showed them that we can represent variable-width integers in Kaitai, but that was the last I heard from them.

zonca commented 1 year ago

SuperCDMS data file:

https://ncsa.osn.xsede.org/supercdms-data/CDMS/UMN/R68/Raw/07180808_1558/07180808_1558_F0001.mid.gz

@pibion will provide data descriptor

zonca commented 1 year ago

confirm I can access the data:

aws s3 --profile osn ls
 s3://supercdms-data/CDMS/UMN/R68/Raw/07180808_1558/07180808_1558_F0001.mid.gz
2023-06-16 03:19:52   48189875 07180808_1558_F0001.mid.gz
pibion commented 1 year ago

@zonca here's the descriptor for v8 of the supercdms format: https://github.com/det-lab/dataReaderWriter/blob/master/kaitai/ksy/scdms_v8.ksy. An example of data that this matches is at https://github.com/det-lab/dataReaderWriter/blob/master/data/51230216_125838_F0001.mid.gz.

The UMN data above is v1, I'm working on a data description for that now.

zonca commented 1 year ago

@pibion do you have docs on how to create a Python interface for this descriptor? do these use awkward arrays?

pibion commented 1 year ago

@zonca yes, @manasvigoyal has added an awkward-array compiler to Kaitai-Struct, https://github.com/ManasviGoyal/kaitai_struct_compiler, https://github.com/ManasviGoyal/kaitai_awkward_runtime

pibion commented 1 year ago

Her talk on how to use these is at https://github.com/ManasviGoyal/PyHEP-2023-Awkward-Target-for-Kaitai-Struct

zonca commented 1 year ago

ok, I managed to install https://github.com/ManasviGoyal/kaitai_awkward_runtime, and run the example:

awkward_array = kaitai_awkward_runtime.load("data/animal.raw")

@ManasviGoyal how do I load https://github.com/det-lab/dataReaderWriter/blob/master/kaitai/ksy/scdms_v8.ksy or the dataset instead?