equinor / segyio

Fast Python library for SEGY files.
Other
471 stars 213 forks source link

Generators behaving badly? #507

Closed da-wad closed 2 years ago

da-wad commented 2 years ago

I've just found a surprise in the way the iline generator is functioning.

Given a slice object with only stop specified I would expect this small program to print True twice. But it doesn't...

import numpy as np
import segyio

def test_iline_generator(f, i):
    list_comp = np.asarray([f.iline[i] for i in range(1,i)])
    slicing = np.asarray(list(f.iline[:i:]))
    print(np.array_equal(list_comp, slicing))

f=segyio.open('test-data\\small.sgy')

test_iline_generator(f, 3) #True
test_iline_generator(f, 4) #False?!!
jokva commented 2 years ago

No, it's working as expected - slice is a generator, and you're not copying the result before adding it to your list.

>>> def test_iline_generator(f, i):
...     list_comp = np.asarray([f.iline[i] for i in range(1,i)])
...     slicing = np.asarray([a.copy() for a in f.iline[:i:]])
...     print(np.array_equal(list_comp, slicing))
... 
>>> test_iline_generator(f, 3)
True
>>> test_iline_generator(f, 4)
da-wad commented 2 years ago

Riiight, specifically a generator which yields mutable values. And the list() constructor doesn't do any copying.

But... the first call to my buggy test_line_generator() returns True. How so? Because you're cycling two buffers in the generator.

Got it. Close?

jokva commented 2 years ago

Yea, segyio knows that the generated value should update every step, so as an optimisation it reuses a pair of underlying objects to allocate only twice for loops of any size. This is a huge performance gain in many common scenarios.

For what it's worth, this is how generators behave for non-segyio objects too. segyio provides collect() as a pre-written asarray(list(generator...)) since this use is so common.