equinor / segyio

Fast Python library for SEGY files.
Other
480 stars 215 forks source link

Write to unassigned tracer header adresses? #476

Closed GGDRriedel closed 4 years ago

GGDRriedel commented 4 years ago

Byte 181 to 240 of each trace header are not assigned, however I do not to write to them as a followup program I use expects the values at that adress(215 in particular)

I can not find a method to do this. Does anyone have an idea of how to tackle this without getting into complicated byte-calculations?

Cheers

GGDRriedel commented 4 years ago

Or is it literally as easy as opening in r+ and doing

f.header[10][215]=myvalue
jokva commented 4 years ago

Segyio has no support for writing to unassigned fields, for a couple of reasons. One, the feature has never really been requested, and two, SEG has in later revisions added significance to those fields and a "too ergonomic" way of using the unassigned fields would just mean broken code down the line. One problem with such an interface would simply be "where would the word boundaries go?".

Of course, this is programming, so for very specific needs it's easy enough to work around them. Most of the writing to fields happens in the C parts of segyio, so it's not really available from Python. But since a file is just a stream of bytes, you can open the file and write to it manually, and use some of segyio's features to determine where and how (i.e. your complicated byte calculations).

GGDRriedel commented 4 years ago

I understand. I thought that maybe since rev1 specifies a range of bytes as "free for all" there would be something to cover this.

People writing wherever they want is one of the many sins of SEGY anyway so i do prefer your solution, however ufortunately I have to work with these ancient formats and conventions.

GGDRriedel commented 4 years ago

Is there any way that I could get a sensible byte mapping description out of segyio where I would pretty much just search for byte n:n+4 of every trace header for every trace ? (more formulating this question to spell it out myself than expecting a fully fledged answer)

jokva commented 4 years ago

I understand. I thought that maybe since rev1 specifies a range of bytes as "free for all" there would be something to cover this.

Could've been, but it hasn't been requested yet, and the exact need would change on a file-by-file basis. At that point it's better to just write bespoke code, which probably only gets noisy when having to go through a segyio interface.

Is there any way that I could get a sensible byte mapping description out of segyio where I would pretty much just search for byte n:n+4 of every trace header for every trace ?

Indirectly it shouldn't be so bad. Segyio already requires that all traces are of the same length, and figuring out the size of each trace is not so hard (sample-size samples + header-size = trace-size). Segyio doesn't directly expose where it figure out the traces start, but it's really just text-headers 3200 + 400.

What's a better option than working globally is to grab the headers with f.header, because the dict you get has a member, buf. This is the raw header as a byte string which you can just index directly into. This is somewhat unsupported as I do not guarantee the field will exist, and keep its behaviour (it's an implementation detail), but it's unlikely to change in quite some time. And if it would change, I would probably add a public function to get the actual on-disk header in bytes, in order to support the exact case you're dealing with.

GGDRriedel commented 4 years ago

The later would be absolutely awesome.

For now I am not able to find that member ```buf```` within any of the objects returned by opening the file and searching the doc does not give me any results.

Could you provide an example on how to access it? It would pretty much solve my problem as I can just translate the bytes I need from there.

jokva commented 4 years ago
>>> import segyio
>>> f = segyio.open('test-data/small.sgy')
>>> f.header[0].buf
bytearray(b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x14\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00')
GGDRriedel commented 4 years ago

Perfect.

Just in case anyone else finds this and has a similar problem, I am decoding with

byteobject= f.header[0].buf
i=header_byte_position
valueiwant=struct.unpack('f', byteobject[i:i+bytelength])[0]

Many thanks to you for this quick and on-the-point help!

jokva commented 4 years ago

That's right. Be careful though, sometimes words are encoded with a different endianness than you might expect, so be aware of it and double check that the values you read out are sane. :--)

GGDRriedel commented 4 years ago

Yeah, i'm working on internal stuff that's supposed to go into promax so all is fine. Figuring out how to WRITE to the position is my current task, I guess that buf member is a copy, not a reference to the file, right?

jokva commented 4 years ago

The buffer starts out as a copy of whatever's in there, but there's the header member flush() that will write whatever's in the buffer back to disk, so if you update it in-place you will see changes reflected in the file.

https://segyio.readthedocs.io/en/latest/segyio.html#segyio.field.Field.flush

GGDRriedel commented 4 years ago

Tried flush on the file object(f) not on the header and got no message, this was my mistake. But it still doesn't seem to reflext on the file. Example code

with segyio.open(filelist[0],mode='r+',ignore_geometry=True) as f:
        test=int.from_bytes( f.header[75].buf[215:218],byteorder='big')
        print(test)
        #set 2 bytes at 215 to the value "12"
        f.header[75].buf[215:218]=int.to_bytes(12,length=2,byteorder='big')
        f.header[75].flush()
        #read it out again
with segyio.open(filelist[0],mode='r+',ignore_geometry=True) as f:        
        test2=int.from_bytes( f.header[75].buf[215:218],byteorder='big')
        print(test2)

Results in 13996 (original value) and 13996 again so it seems like nothing is changed

jokva commented 4 years ago

Because you're discarding the buffer you update, read a new one, and flush that one :--)

header = f.header[75]
header.buf[215:218] = int.to_bytes(12, length=2,byteorder='big')
header.flush()
GGDRriedel commented 4 years ago

Welp. Also discovered that I am a bit off with my indexing and bytesizes so I fixed that as well. Dunno how I can thank you enough for this.

        header = f.header[75]
        header.buf[215:219] = int.to_bytes(12, length=4,byteorder='big')
        header.flush()
jokva commented 4 years ago

No problem!