go-hep / hep

hep is the mono repository holding all of go-hep.org/x/hep packages and tools
https://go-hep.org
BSD 3-Clause "New" or "Revised" License
230 stars 35 forks source link

xrootd: how to read raw bigendian bytes? #912

Closed Moelf closed 2 years ago

Moelf commented 2 years ago

Hi,

I'm trying to write a c-shared wrapper with hep/xrootd, I basically got it working, the application boils down to:

//export ReadAt
func ReadAt(res unsafe.Pointer, _id *C.char, NBytes C.int, offset C.int) {
    file := _FILES[_id]
    data := unsafe.Slice((*byte)(res), int64(NBytes));
    _, err := file.ReadAt(data, int64(offset))
    if err != nil {
        log.Fatal(err)
    }
}

I would allocate on the other language's side and pass a pointer to Go. However, due to how the rest of the other language's library is designed, we need to read big-endian raw bytes for each basket. because

[1, 2, 3, 4] -> [[2, 1], [3, 4]]
# is different from
[[4, 3], [2, 1]]

if you imagine the branch is a vector of Float16, you need to group bytes before conducting byte swap. I'm wondering if there's a way (maybe a lower level API?) to achieve this

Moelf commented 2 years ago

I think I want whatever function that fills: https://github.com/go-hep/hep/blob/04a2d5fcc255485dba1cbbe2ecf5026ec879e526/groot/rbytes/rbuffer.go#L59-L63

so that I can build buffer on the other language's side to not anger GC god and fill that buffer with raw, bigendian bytes

sbinet commented 2 years ago

decoding big/little endian is done with the encoding/binary package:

https://go.dev/play/p/hLqhICaF1d6

I think reaching down to groot/rbytes would be ill advise: better keep all this into UnROOT.jl and dedicate the Julia wrapper of go-hep/xrootd to just providing a file/connection interface to Julia.

Moelf commented 2 years ago

Yeah indeed. But to read some data, I need to to use something from here right? Which is why I want to read raw bytes, without being encoded, which is to say ReadAt is too high-level for my usage.

Are you saying there is a way to use connection/file handler even more directly?

Moelf commented 2 years ago

My understanding of the stack:

flowchart TD
A["ReadAt()"] --> |"pass p []byte"| B["ReadAtContext()"]
B --> |var| C(resp Data: p)
B --> |var| D(req resp)
C --> D
D --> E["sendSession()"]
E --> |&resq, req| F["session.Send()"]
F --> |var| G(wBuffer)
G --> buf1["header.MarshalXrd(&wBuffer)"]
G --> buf2["req.MarshalXrd(&wBuffer)"]
buf1 --> data1("data := wBuffer.Bytes()")
buf2 --> data1("data := wBuffer.Bytes()")
data1 --> returnfinal["resp.UnmarshalXrd(xrdenc.NewRBuffer(data))"]
sbinet commented 2 years ago

ReadAt is quite low-level. (and it is completly agnostic to the endianness of the bytes it's reading)

IIUC what you want to do, you first read some data from the underlying file descriptor (into a []byte of the correct size), and then you can interpret these bytes however you want (e.g. as a big-endian uint16):

func readU16(f io.ReaderAt) uint16 {
    var buf = make([]byte, 2)
    _, err := f.ReadAt(buf, offset)
    if err != nil { ... }
    return binary.BigEndian.Uint16(buf)
}
Moelf commented 2 years ago

(and it is completly agnostic to the endianness of the bytes it's reading

oh... I might have fooled myself because the magic bytes ("root") are not bigendian?

sbinet commented 2 years ago

yeah, the magic string is a 4-byte array of ['r', 'o', 'o', 't'], with no encoding.

Moelf commented 2 years ago

ok, thanks, I think this is all I need then. Closing, thanks for bearing with my confusion.

I see the vectorized read _vread is not implemented, so I will use this for now!