ehpor / hcipy

A framework for performing optical propagation simulations, meant for high contrast imaging, in Python.
https://hcipy.org
MIT License
93 stars 30 forks source link

Add grid indexing support when indexing or slicing a Field #219

Open NuggetOfficial opened 11 months ago

NuggetOfficial commented 11 months ago

Hey! I am running some optimisation code and I am currently considering batching a field and sending it to different cores. I am running into the issue that an indexed/sliced field does not return the grid corresponding to the index/slice. Consider the following:

from hcipy import make_pupil_grid
import numpy as np

# make random Field object
grid  = make_pupil_grid(512)
arr   = np.random.randn(grid.size) 
field = Field(arr, grid)

# slice field
sliced_field = field[200:400]
print(sliced_field.size, sliced_field.grid.size)

The output I expected intuitively is a new Field object with a .size of 200 and with a .grid.size 200; whenever we pick a number from a field the associated coordinate is returned along with it.

However the current output is a Field object with a .size of 200 and with a .grid.size 262144. In other words, the grid itself is not indexed but left alone. This is an afterproduct of creation definition of a Field:

def __new__(cls, arr, grid):
    obj = np.asarray(arr).view(cls)
    obj.grid = grid
    return obj

which causes any indexing of a field to only consider the input array.

I propose extending the __getitem__ dunder of the Field class so that it calls the __getitem__ of both the contained array, in this case self, and the associated grid, self.grid.

I started working on the alteration but the thing I am struggeling with is how to turn the indexed grid back into the proper <any>Grid object that contains the correct coordinates. Currenlty calling the <any>Grid.__getitem__ method returns an ndarray with the corresponding coordinates. Any ideas on how to tackle this?

This would make slicing a field in 4 so I can send the different slices to different cores a lot easier.

ehpor commented 11 months ago

Hey!

Yes, these are part of a few things that are not intuitive with the current implementation. When first implemented, the way Numpy array subclassing worked, we couldn't implement this type of dunder method at all. That changed a few Numpy releases later, at which point I did not have time to convert.

Now that non-subclassing numpy arrays works with recent Numpy releases (old enough that everyone should have switched over already), I wanted to reimplement Fields as a standalone class, which has many other advantages, including being able to implement more specialized Grid conversions as you are doing here. That implementation is PR #187 which I would love feedback / testing support. Since you are clearly interested in doing some low-level HCIPy work, could we schedule a call to talk through things? I'll send you an email.

As an aside, the method that you're looking for is Grid.subset(). Also, I'm interested in why you need to batch Fields in the first place. Not saying it's a bad thing, just that most of the time, the built-in methods are parallelized enough that batching only gains you a factor of 2-3x. Plus, multiprocessing is usually done best at the outer loop, which is often an iteration over wavelengths.

NuggetOfficial commented 11 months ago

Ill also write a response here for consistency's sake. Would love to work on it! As for why im batching, im running a monochromatic pixel by pixel optimisation algorithm for a VVC. Retrieving the bulk model is already very fast, order of seconds, but pixel by pixel fit is currently quite slow, order of 100 pixels/s (I have a sneaking suspicion that pythons forloop overhead is one of the main culprits). This makes for an average computation time of 45 min per image or so, dividing that time by 3 would be fantastic.