equinor / segyio

Fast Python library for SEGY files.
Other
476 stars 214 forks source link

RuntimeError: unable to count traces, no data traces past headers #464

Closed malapradej closed 4 years ago

malapradej commented 4 years ago

I get the above error after trying to open a su file I saved previously using this code:

# save phase_df to SEGY format file
import segyio
segy_fname = 'rock_drop_data.sgy'
sampling_interval = 1/ phase_df.sampling_rate * 1000000 # in microseconds
segyio.tools.from_array2D(segy_fname, phase_df.values.T, dt=sampling_interval)

# load SEGY format file
with segyio.open(segy_fname, 'r+', strict=True, endian = 'big') as segy_file:

    # set trace header values
    for trace_header, trace_distance in zip(segy_file.header[:], phase_df.columns):
    trace_header[segyio.TraceField.CoordinateUnits] = 1 # 1 is meters
    # set the geophone offset distances instead of coordinates
    trace_header[segyio.TraceField.offset] = int(trace_distance)
    # set geophone spacing
    trace_header[segyio.TraceField.INLINE_3D] = int(phase_df.resolution) # d2
    # set date and time of first value in trace
    trace_header[segyio.TraceField.YearDataRecorded] = phase_df.index[0].year
    trace_header[segyio.TraceField.DayOfYear] = phase_df.index[0].timetuple().tm_yday
    trace_header[segyio.TraceField.HourOfDay] = phase_df.index[0].hour
    trace_header[segyio.TraceField.MinuteOfHour] = phase_df.index[0].minute
    trace_header[segyio.TraceField.SecondOfMinute] = phase_df.index[0].second
    trace_header[segyio.TraceField.TimeBaseCode] = 4 # 4 is UTC

import os
os.system('segyread > {}.su tape={}'.format(os.path.splitext(segy_fname)[0], segy_fname))

It fails when I open it this way:

# load the su fk data
fk_fname = os.path.splitext(segy_fname)[0]+'.su'

# load SEGY format file
fk_file = segyio.open(fk_fname, 'r', strict=True, endian = 'big')
jokva commented 4 years ago

If you want to open a seismic-unix file, you need to use the segyio.su.open function. Its behaviour should otherwise be identical to segyio.open.

jokva commented 4 years ago

For what it's worth, you can speed up your program a little bit by doing a single trace_header.update, instead of assigning individual fields many times. When you do header[key] = value, the entire header is being written back to disk, even if only a single value is updated.

malapradej commented 4 years ago

thank @jokva.

I get the following error after using segyio.su.open

`RuntimeError: trace count inconsistent with file size, trace lengths possibly of non-uniform'

jokva commented 4 years ago

Did segyread work the way you expect? Can you read it properly with seismic-unix?

If you're using segyio on both ends, why do you go via su?

malapradej commented 4 years ago

@jokva

I convert the segy file to su using:

os.system('segyread > {}.su tape={}'.format(os.path.splitext(segy_fname)[0], segy_fname))

and I can view and plot the file using seis-unix commands like suedit with the following output:

suedit: ! examine only (no header editing from STDIN)

3918 traces in input file
 tracf=3917 cdpt=3917 offset=29996 counit=1 ns=60000 dt=1000
 year=2019 day=282 hour=13 minute=5 timbas=4 d2=0.000000
 f2=599785472.000000

The output of doing the conversion using 'segyread' on the command line:

3200+0 records in
6+1 records out
3200 bytes (3.2 kB) copied, 0.00456896 s, 700 kB/s
getpar() call after checkpars(): nextended

There are 3918 traces.

I use seis-unix to do an Fk filter on the su file. Thus need it in su format. I need to be able to read in the fk file which will be in su format too. Trying to read in a simple su file is the first step.

PS.

When I query the first trace using suedit I get:

> 1
 offset=10001 counit=1 ns=60000 dt=1000 year=2019 day=282
 hour=13 minute=5 timbas=4 d2=0.000000 f2=0.000000

and the second trace seems to be the first one with a tracf number.

> 2
 tracf=1 cdpt=1 offset=10006 counit=1 ns=60000 dt=1000
 year=2019 day=282 hour=13 minute=5 timbas=4 d2=0.000000
 f2=0.000000
jokva commented 4 years ago

Looks like segyread really doesn't write enough data, only 3200 bytes, is that correct? How large is the outputted su file?

malapradej commented 4 years ago

@jokva

I updated the answers above.

Concerning the su file size it is 898M. The same size as the sgy file. There is an additional binary and header file it created with sizes 400 and 3.2K.

jokva commented 4 years ago

Interesting. Some unlikely suspects, but just to have them covered: print the full file path, and make sure it's the right one (fk_fname), and verify that the seismic-unix file is in big-endian.

malapradej commented 4 years ago

Check and it is all set as big-endian and the full path is in the local directory where the notebook resides and fk_fname is rock_drop_data.su

malapradej commented 4 years ago

I viewed the rock_drop_data.su file using suximage and it seems correct. So more than just 3200 bytes is being written to the file.

jokva commented 4 years ago

Alright, so since there's no binary header in the seismic unix format, for the file to be compliant you need to write the sample count in every trace header (well, in the first tracer header). Do that, and I think it should work.

trace_header[segyio.su.ns] = samplecount

malapradej commented 4 years ago

OK thanks, but now it comes up with another error:

RuntimeError: trace count inconsistent with file size, trace lengths possibly of non-uniform

suedit does show that there are 60000 samples per trace though, and this should be for every trace.

The first trace tracf number still doesn't show up. It only starts on the second one. Not sure if this indicates the issue.

suedit <rock_drop_data.su

suedit: ! examine only (no header editing from STDIN)

3918 traces in input file
 tracf=3917 cdpt=3917 offset=29996 counit=1 ns=60000 dt=1000
 year=2019 day=282 hour=13 minute=5 timbas=4 d2=0.000000
 f2=599785472.000000
> 1
 offset=10001 counit=1 ns=60000 dt=1000 year=2019 day=282
 hour=13 minute=5 timbas=4 d2=0.000000 f2=0.000000
> 2
 tracf=1 cdpt=1 offset=10006 counit=1 ns=60000 dt=1000
 year=2019 day=282 hour=13 minute=5 timbas=4 d2=0.000000
 f2=0.000000
jokva commented 4 years ago

Turns out that the from_array function already sets ns, so it shouldn't be necessary for you to do it.

Check that the right header word is populated in the SEG-Y you're reading from.

print(segyio.open(segy_fname, 'r', strict=True, endian = 'big').header[0][segyio.su.tracf])

malapradej commented 4 years ago

that shows up as:

RuntimeError: unable to find sorting.
jokva commented 4 years ago

Right, use ignore_geometry = True, and remove strict = False, since it's 2D and whatnot. I have to investigate a bit more to determine if a from_array2D file should always be considered sorted or not, but that's a job for later.

malapradej commented 4 years ago

Added that to both the opening of the sgy and su files and it comes up with this when I open the su file:

RuntimeError: trace count inconsistent with file size, trace lengths possibly of non-uniform
jokva commented 4 years ago

Alright, that's quite interesting, it could be that the file is broken already at from_array.

First things first then.

with segyio.open(segy_fname, 'r+', strict=True, endian = 'big') as segy_file:

Remove this entire block from your program, and stop right after from_array2D, then try to open the file (with segyio) - does it still work?

malapradej commented 4 years ago

After opening the file with this straight after the from_array2D:

segyio.open(segy_fname, 'r', endian = 'big',  ignore_geometry = True)

comes up with

SegyFile('rock_drop_data.sgy', 'r', iline = 189, xline = 193)
jokva commented 4 years ago

Alright, and is there a problem when you open the segy after bringing the block back?

malapradej commented 4 years ago

Straight after the block and using this to open it:

segyio.open(segy_fname, 'r+', endian = 'big', ignore_geometry = True)

comes up with

SegyFile('rock_drop_data.sgy', 'r+', iline = 189, xline = 193)
jokva commented 4 years ago

Ok, interesting, so at least from segyio's point of view it's not too broken at that point.

jokva commented 4 years ago

Try repeating the same experiment, but without ignore_geometry = True.

malapradej commented 4 years ago

Thanks for your help so far. Running the same code without ignore_geometry I get the error:

RuntimeError: unable to find sorting.
jokva commented 4 years ago

When doing only from_array2D?

jokva commented 4 years ago

What version are you running?

>>> import segyio
>>> import numpy as np
>>> a = np.arange(80).reshape((10, 8))
>>> segyio.tools.from_array2D('f.sgy', a)
>>> f = segyio.open('f.sgy')
>>> f
SegyFile('f.sgy', 'r', iline = 189, xline = 193)

A from_array2D should open just fine.

malapradej commented 4 years ago

I ran it all without the block, thus only using from_array2D and without ignore_geometry. The result was this after trying to open the file.

SegyFile('rock_drop_data.sgy', 'r+', iline = 189, xline = 193)
jokva commented 4 years ago

Ok, so that means what what makes the geometry go is you breaking the headers. This makes sense, since you have

trace_header[segyio.TraceField.INLINE_3D] = int(phase_df.resolution) # d2

in the block. The from_array functions make inline sorted files, and that means that only the crossline number should change (for 2D files).

edit: not that it should matter for su-segyread.

jokva commented 4 years ago

Now that you can open the file, check that the first header's tracf is properly set.

print(segyio.open('segy').header[0][segyio.su.tracf])

malapradej commented 4 years ago
print(segyio.open('rock_drop_data.sgy').header[0][segyio.su.tracf])

returns

0
malapradej commented 4 years ago

Is there a way to write the traces to a file including the headers in the same loop without having to use the from_array2d ? The only reason I used this is that I couldn't find another way.

Assuming this is the problem.

jokva commented 4 years ago

Sure, it should be pretty straight forward:

for i, (trace_distance, trace) in enumerate(zip(phase_df.columns, phase_df.values)):
    f.header[i].update(offset = 1, sec = phase_df.index[0].second, ...)
    f.trace[i] = trace

Note that the update method takes su keys only.

https://segyio.readthedocs.io/en/latest/segyio.html#segyio.field.Field.update

Of course, now you must take care when creating the file to set all header words that are required for your file to make sense.

malapradej commented 4 years ago

That was the other reason I used the from_array2d. Seemed to take care of all the default header stuff. Is this not possible in the looping method?

jokva commented 4 years ago

No, when you create the file it is completely empty (as defined by your operating system), since you will have to make a conscious decision on what to write anyway. from_array is really just for when you just want a quick dump of your data, it is not appropriate to build solid, well-defined files. Its interface is small & easy to use for one-off tasks, but there are too many decisions you need to make when writing files properly, and that would make from_array complicated. There is not a good concept of default headers in SEG-Y.

Now, the fact that the file seemingly works well without your block leads me to believe that what you add to the headers is somewhat off. It could be that the file written by segyread is richer than segyio's su support, and to debug that I would need to run your full experiment, with the same data. The fact that the first tracf is 0 leads me to believe that it was always in the file, and the reason you didn't see it with your suedit program is that suedit just don't print words with the value zero.

malapradej commented 4 years ago

I assume I will have to create a spec before I create the file, which I then add as the second argument to the create method. I tried with a default spec as in

with segyio.create(segy_fname, segyio.spec()) as segy_file` 

and complained:

AttributeError: 'spec' object has no attribute 'tracecount'
jokva commented 4 years ago

Yes, and you need to set the appropriate members on the spec object for your file.

https://segyio.readthedocs.io/en/latest/segyio.html#segyio.create

With 2D, you set single-element array as xlines, and populate the ilines.

NB: You still need to write the in/xline attribute in the trace headers!

malapradej commented 4 years ago

I've added the following to the spec:

spec.ilines  = list(np.round(phase_df.columns.values).astype(int))
spec.xlines  = [0]

and within the loop I add this to the trace: in the update method of the trace header I use the following arguments iline = np.round(trace_distance).astype(int), xline = 0

so in effect I am using the distances of my 'sensors/trace' as the inline array and the xline is [0].

If I try to open the su file again I get:

RuntimeError: trace count inconsistent with file size, trace lengths possibly of non-uniform
jokva commented 4 years ago

Please post your full program.

Does the seg-y still open? Do you properly set the samples field in the trace headers?

malapradej commented 4 years ago

Interesting. I reduced the size of the phase_df file to under 149 MB and it seems to work. Previously it was over 1.8GB.

malapradej commented 4 years ago

The aim of the exercise is to compare our own fk filtering with that done by seis-unix. This part of the code is used to do an fk filtering on the su file:

os.system('suspecfk dt={} dx={} <{}.su >{}.fk'.format(sampling_interval/1000000,
                                                  phase_df.columns[1] - phase_df.columns[0],
                                                  os.path.splitext(segy_fname)[0],
                                                  os.path.splitext(segy_fname)[0]))

I know it works as I can view the file using suximage <file_name.fk. However when I try to import it into segyio using

segyio.su.open(os.path.splitext(segy_fname)[0]+'.fk', 'r', endian = 'big')

I get

RuntimeError: unable to find sorting.
jokva commented 4 years ago

What system are you running on? Operating system, 32 or 64 bit? If file size affects the outcome, that's a common culprit.

Now, not you not being able to find sorting might just mean that what your suspectfk program writes is not as tidy as what segyio expect. Try opening it with ignore_geometry = True, then manually print the in/crossline header words.

malapradej commented 4 years ago

I am running the code in a jupyter notebook on a 64 bit server running Centos 7.

It works once I add ignore_geometry = True. I'm not sure how to print the inline crossline header words. This is what I get when I print the inline and cross-line values. Is this what you mean?

INLINE_3D: 961510520, CROSSLINE_3D: -1110927146
malapradej commented 4 years ago
sugethw key=d1,d2,f1,f2 <rock_drop_data.fk | head -n 1

outputs:

d1=0.049950     d2=0.000198     f1=0.000000     f2=-0.097948

I need to get these values from the header but can't see how this is done in segyio. I printed the header using su_file.header[0] which is shown here but it doesn't seem to contain any of these values:

{TRACE_SEQUENCE_LINE: 1, TRACE_SEQUENCE_FILE: 0, FieldRecord: 0, TraceNumber: 0, EnergySourcePoint: 0, CDP: 0, CDP_TRACE: 0, TraceIdentificationCode: 122, NSummedTraces: 0, NStackedTraces: 0, DataUse: 0, offset: 0, ReceiverGroupElevation: 0, SourceSurfaceElevation: 0, SourceDepth: 0, ReceiverDatumElevation: 0, SourceDatumElevation: 0, SourceWaterDepth: 0, GroupWaterDepth: 0, ElevationScalar: 0, SourceGroupScalar: 0, SourceX: 0, SourceY: 0, GroupX: 0, GroupY: 0, CoordinateUnits: 0, WeatheringVelocity: 0, SubWeatheringVelocity: 0, SourceUpholeTime: 0, GroupUpholeTime: 0, SourceStaticCorrection: 0, GroupStaticCorrection: 0, TotalStaticApplied: 0, LagTimeA: 0, LagTimeB: 0, DelayRecordingTime: 0, MuteTimeStart: 0, MuteTimeEND: 0, TRACE_SAMPLE_COUNT: 10011, TRACE_SAMPLE_INTERVAL: 0, GainType: 0, InstrumentGainConstant: 0, InstrumentInitialGain: 0, Correlated: 0, SweepFrequencyStart: 0, SweepFrequencyEnd: 0, SweepLength: 0, SweepType: 0, SweepTraceTaperLengthStart: 0, SweepTraceTaperLengthEnd: 0, TaperType: 0, AliasFilterFrequency: 0, AliasFilterSlope: 0, NotchFilterFrequency: 0, NotchFilterSlope: 0, LowCutFrequency: 0, HighCutFrequency: 0, LowCutSlope: 0, HighCutSlope: 0, YearDataRecorded: 0, DayOfYear: 0, HourOfDay: 0, MinuteOfHour: 0, SecondOfMinute: 0, TimeBaseCode: 0, TraceWeightingFactor: 0, GeophoneGroupNumberRoll1: 0, GeophoneGroupNumberFirstTraceOrigField: 0, GeophoneGroupNumberLastTraceOrigField: 0, GapSize: 0, OverTravel: 0, CDP_X: 1028429932, CDP_Y: 0, INLINE_3D: 961510520, CROSSLINE_3D: -1110927146, ShotPoint: 0, ShotPointScalar: 0, TraceValueMeasurementUnit: 0, TransductionConstantMantissa: 0, TransductionConstantPower: 0, TransductionUnit: 0, TraceIdentifier: 0, ScalarTraceHeader: 0, SourceType: 0, SourceEnergyDirectionMantissa: 0, SourceEnergyDirectionExponent: 0, SourceMeasurementMantissa: 0, SourceMeasurementExponent: 0, SourceMeasurementUnit: 0}

print(su_file)

shows:

SegyFile rock_drop_data.fk:
  inlines: None
  crosslines: None
  traces: 990
  samples: [0. 0. 0. ... 0. 0. 0.]
  float representation: 4-byte IEEE float
jokva commented 4 years ago

I am running the code in a jupyter notebook on a 64 bit server running Centos 7. Ok, so it's maybe not overflow.

You can get specific header words by doing

print(file.header[n][segyio.su.iline, segyio.su.xline])

This is using the SU words, you can use the tracefield names if you'd like too.

An easy way to get a single word file-wide:

ils = file.attributes(segyio.su.iline)[:]

ils is now a numpy array that you can print and manipulate as you see fit.

If the file fails to open when it's longer than a specific size it's possible because your program writes the file in a way segyio does not expect. It's quite hard for me to determine exactly what it is without inspecting the file myself, but you can consult the documentation of your su distribution to see if it is honest about doing something weird.

malapradej commented 4 years ago

Thanks for your help so far. Much appreciated.

the inline and crossline values are as follows for all traces:

{INLINE_3D: 961510520, CROSSLINE_3D: -1110927146}

I'm after the d1, d2, d3, d4 values. Not sure how to get those. According to what I read this should could be cdpx, cdpy, iline, and xline but they don't match the values that sugethw show. Which is:

$ sugethw key=d1,d2,f1,f2 <rock_drop_data.fk | head -n 1
d1=0.049950     d2=0.000198     f1=0.000000     f2=-0.097948    

These are the values for cdpx and cdpy:

{CDP_X: 1028429932, CDP_Y: 0}
jokva commented 4 years ago

From the manual:

float d1; /* sample spacing for non-seismic data */
float f1; /* first sample location for non-seismic data */
float d2; /* sample spacing between traces */
float f2; /* first trace location */

http://web.mit.edu/cwpsu_v44r1/sumanual_600dpi_letter.pdf See appendix B. These are floats in SU, but there are no floats in SEG-Y (at least not until rev2). It's quite likely they occupy what would later become iline/xline, as su predates SEG-Y rev1.

This program confirms my suspicion:

[~] λ cat malapradej.c
#include <stdio.h>
#include <string.h>

int main() {
    int iline3d = 961510520;
    float d2;
    memcpy(&d2, &iline3d, sizeof(d2));
    printf("%f\n", d2);
}
[~] λ make malapradej && ./malapradej
cc     malapradej.c   -o malapradej
0.000198

What that means is that you're reading the correct values, but segyio interprets them as integers. This mismatch is a result of SU and SEG-Y being subtly different here. You can recover your data by interpreting the header words as floats when appropriate. For that you can use the struct module [1], or numpy [2]

[1] https://stackoverflow.com/questions/34402334/how-to-interpret-a-integer-as-a-float [2] https://stackoverflow.com/questions/37093485/how-to-interpret-4-bytes-as-a-32-bit-float-using-python/37093610

malapradej commented 4 years ago

Thank you for that. I used the following to retrieve them using seis-unix. I will try your method as it seems better.

os.system('sugethw key=d1,d2,f1,f2 <{}.fk | head -n 1 > 
temp.txt'.format(os.path.splitext(segy_fname)[0]))

with open('temp.txt', 'r') as temp_file:
    line = temp_file.readline()

temp_dict = {}

for l in line.split():
    a, b = l.split('=') 
    temp_dict[a] = float(b)

print(temp_dict)
jokva commented 4 years ago

Splendid, happy it worked out.

malapradej commented 4 years ago

Just to confirm your method works.

import struct

def float_from_integer(integer):
    return struct.unpack('!f', struct.pack('!i', integer))[0]

temp_dict = {}

temp_dict['d2'] = float_from_integer(su_file.header[0][segyio.su.iline])
temp_dict['d1'] = float_from_integer(su_file.header[0][segyio.su.xline])
temp_dict['f2'] = float_from_integer(su_file.header[0][segyio.su.cdpx])
temp_dict['f1'] = float_from_integer(su_file.header[0][segyio.su.cdpy])

with result:

{'d2': 0.00019787426572293043,
 'd1': -0.09794776141643524,
 'f2': 0.09990009665489197,
 'f1': 0.0}