equinor / segyio

Fast Python library for SEGY files.
Other
476 stars 214 forks source link

attribute parsing problem #432

Closed pythonmobile closed 4 years ago

pythonmobile commented 4 years ago

Why is line 1 below 10x slower than line 2:

gx = [segy_file.attributes(iline)[i] for i in range(tc)]
gx = segy_file.attributes(iline)[:]

Is there a way to parse all the trace headers and extract multiple attributes instead of just one?

jokva commented 4 years ago

Because line 2 does most of its work in C, where line 1 goes back-and-forth between C and Python. The attributes feature throws away the key, so it's really only intended for when you want a single value across the whole file.

The better way to extract multiple attributes would be [header[attr1, attr2, ...] for header in f.header], but it shouldn't perform too differently from your line 1.

pythonmobile commented 4 years ago

@jokva Thanks! Can you point me to how can I modify the C code to support multiple fields? I would like to have [header[attr1, attr2, ...] for header in f.header] to work at the speed of line 2.

jokva commented 4 years ago

Sure.

The function that drives attributes() is field_foreach https://github.com/equinor/segyio/blob/master/python/segyio/segyio.cpp#L782

Now, it's unlikely that you'll see the same speedup when extracting multiple fields, because you now either need larger reads (which means header mode anyway), or multiple seeks. Both will significantly slow down your program. There's a reason it's not available in segyio, just like that.

What are you trying to accomplish? If your performance requirements are that significant then you might want to switch to a faster language than Python.

pythonmobile commented 4 years ago

Thanks. I am trying to parse a large SEGY file (> 1TB) and was trying to avoid the slowdown. Is there a good parser for SEGY in a faster language you recommend? I am going to try to write this in C.

jokva commented 4 years ago

Depends on your needs. There's already the segyio-catr application, written in C and a part of segyio, that can print headers. It prints all of it though so you need to run it through grep or something.

If that doesn't work, you can use segyio from C or C++. It's more work, obviously, but you have a lot more control that way. Of course, that interface doesn't really support your use case either, but it is probably faster to read the full headers, and extract what you need from them.

pythonmobile commented 4 years ago

Thanks @jokva ! Your comments were very helpful. I will look into those soon.