fortran-lang / stdlib

Fortran Standard Library
https://stdlib.fortran-lang.org
MIT License
1.06k stars 164 forks source link

Interpret bytes as packed binary data #621

Open lewisfish opened 2 years ago

lewisfish commented 2 years ago

Motivation

I've recently had cause to write Fortran code that communicates over a socket, and code that reads a structured binary file format. For both of these I needed to write routines that could read n bytes and convert those bytes into characters, integers etc or write integer, characters etc into n bytes.

Prior Art

Pythons struct, particularly the pack and unpack routines. Szaghi's BeFoR64 pack_data routine.

Additional Information

No response

jvdp1 commented 2 years ago

I am not sure to understand the details. Is the intrinsic transfer not what you are looking for? Or maybe this bitset module?

lewisfish commented 2 years ago

Sorry if I was not clear. What I'm meaning is essentially a user friendly wrapper around the intrinsic transfer to read or write bytes to/from a binary file. For example in python using the struct libray one can read a arbitrary packed binary files (example from voxwriter) like so:

Currently in Fortran it is a lot more verbose to do the same thing with bare intrinsics. Does this make more sense? I don't think bitset module does this, though my understanding of what a bitset is, is lacking. If you still think that transfer suffices, please feel free to close the issue :smiley:

nncarlson commented 2 years ago

I believe he's interested in utilities for serialization/deserialization of a data structure. At its heart it is the transfer function, but that function is really clunky to use directly and it is very useful to have some simple-to-use wrapper procedures. For example,

call copy_to_bytes(var, buffer, offset)

would turn the variable var to a sequence of bytes and add them to the int8 array buffer starting at the given offset and update offset accordingly. Serializing a data structure then just amounts to a sequence of clean, understandable calls to copy_to_bytes.

PS: I have such procedures, but for some reason I implemented them using c_loc and c_f_pointer to effectively equivalence storage instead of using transfer. I'm not sure why now.

lewisfish commented 2 years ago

I believe he's interested in utilities for serialization/deserialization of a data structure. At its heart it is the transfer function, but that function is really clunky to use directly and it is very useful to have some simple-to-use wrapper procedures. For example,

call copy_to_bytes(var, buffer, offset)

would turn the variable var to a sequence of bytes and add them to the int8 array buffer starting at the given offset and update offset accordingly. Serializing a data structure then just amounts to a sequence of clean, understandable calls to copy_to_bytes.

PS: I have such procedures, but for some reason I implemented them using c_loc and c_f_pointer to effectively equivalence storage instead of using transfer. I'm not sure why now.

Yes this is exactly what I mean.

ivan-pi commented 2 years ago

Is this task related to the FD thread: Bytearray for socket packets?

Me gut feeling is the stdlib_bitset is not applicable because we use our own bitset literal format for I/O.


For the buffer there is the possibility to use int8 or character(len=1). I believe the former is better since it is guaranteed to have the correct size. For text data, the character has the advantage it can be printed easily.

lewisfish commented 2 years ago

Is this task related to the FD thread: Bytearray for socket packets?

Yes it is, thanks again for your help on that. Code from that question is here, though it needs a tidy up.

For the buffer there is the possibility to use int8 or character(len=1). I believe the former is better since it is guaranteed to have the correct size. For text data, the character has the advantage it can be printed easily.

I suppose you could have it as int8, and then have a to_char or print routine?