equinor / segyio

Fast Python library for SEGY files.
Other
471 stars 213 forks source link

Changes source energy direction trace header #505

Open ar4 opened 3 years ago

ar4 commented 3 years ago

Previously, bytes 219-224 of trace headers were specified in the code to contain a 4-byte mantissa and 2-byte exponent for the source energy direction.

The SEG-Y revision 2 standard, however, says that these bytes contain:

" Source Energy Direction with respect to the source orientation — Three two- byte two’s complement binary integers for vertical, cross-line and in-line inclinations respectively. The positive orientation direction is defined in Bytes 217–218 of the Standard Trace Header. The energy direction is encoded in tenths of degrees (i.e. 347.8o is encoded as 3478 10 (0D96 16 )). "

This commit removes the trace headers SEGY_TR_SOURCE_ENERGY_DIR_MANT SEGY_TR_SOURCE_ENERGY_DIR_EXP

and replaces them with SEGY_TR_SOURCE_ENERGY_DIR_VERT SEGY_TR_SOURCE_ENERGY_DIR_XLINE SEGY_TR_SOURCE_ENERGY_DIR_ILINE

jokva commented 3 years ago

Hi, Alan, and thanks for the thorough patch!

The only thing that worries me is that this is a sort-of breaking change. Obviously, pre-v2 these words were really unspecified, and I suspect segyio inherited the mantissa/exponent from earlier implementations, but in a way they were always broken.

I'll give it another couple of hours of thought on how to handle it - just eat the breakage, maybe introduce a deprecation-warning trigger on the mantissa/exponent, maybe something else.

ar4 commented 3 years ago

Hi Jørgen,

No rush! I suspect that very few people use these headers anyway.

I think the problem occurred because the v1 standard, which introduced this header, wasn't clear enough. It just says

Source Energy Direction with respect to the source orientation — The positive orientation direction is defined in Bytes 217-218 of the Trace Header. The energy direction is encoded in tenths of degrees (i.e. 347.8o is encoded as 3478).

The other 6 byte headers are in the mantissa-exponent format, so it was natural to think that this one was the same.

In the current commit I had to change some "SegyFile"s to "segyio.SegyFile"s in python/segyio/tools.py in order to pass the tests. These changes were unrelated to the header change, though, so I will split them into a separate commit.

jokva commented 3 years ago

Oh yeah, I've read through the rev1 specification so. many. times., and the format of 219 is just not at all clear. A 6-byte word, in tenths, with no reference (as far as I know) of a "default behaviour" for 6-byte words. Just bonkers.

I haven't really seen any real-world use in a couple of hours of searching, so I think it should be a safe patch. I'm still toying with the idea of adding a sentinel value which raises a deprecation error or something, which then points to this issue, in case people have old print routines and stuff which does depend on this word for iteration or completeness.

ar4 commented 3 years ago

Perhaps the cleanest approach would be to park this change until the next major revision?

jokva commented 3 years ago

Maybe, but a proper revision may never happen, and would probably include some re-imagined components.

Making the old names throw isn't so bad from an implementation point of view:

class DeprecatedAttribute:
      def __init__(self, msg):
          self.msg = msg

      def __get__(self, obj, type):
          raise NotImplementedError(self.msg)

class TraceField(Enum):
     SourceEnergyDirectionMantissa = DeprecatedAttribute('SourceEnergyDirectionMantissa is deprecated. See https://github.com/equinor/segyio/pull/505')                                                                                                       

And in use:

>>> segyio.TraceField.SourceEnergyDirectionMantissa
NotImplementedError: SourceEnergyDirectionMantissa is deprecated. See https://github.com/equinor/segyio/pull/505
jokva commented 3 years ago

Breaking changes are obviously frustrating, but the standard was broken. But as you say, this is not a much-used word, and migration should be pretty simple. If this would be in use in a mantissa-exponent style, then it is possible to combine the three integers manually.

I saw one use where the mantissa was used as an inline key, and if that relied on it being 4-bytes then... we're out of luck. I think that's sufficiently out there to not warrant holding back this change.

I'm still not sure if it should just be a regular NameError, a custom NameError or a NotImplementedError. I suppose a custom NameError would be helpful if someones code should just break, as it could give a strong hint on how to fix it, and why the break was necessary.