make get_neighors agonstic to torch vs. numpy

brookehus commented 5 years ago

Development:

[x] Implement feature / fix bug
[x] Add documentation
[x] Add tests

Checks:
[x] Run nosetests
[x] Check pep8 compliance

This is in response to my comment in #92 w.r.t. @nec4's feedback to @Dom1L that geometry.get_neighbors should have an option to be entirely in pytorch.

wrt @Dom1L's original code, I don't think we want things within geometry methods like:

if isinstance(distances, torch.Tensor):
    ...

or

if self.method == 'torch':
    return torch.from_numpy(neighbors), torch.from_numpy(neighbor_mask)
 else:
    return neighbors, neighbor_mask

I think we want to avoid this kind of thing in the methods and have them just always use what they need to use. I will be stricter on reviews in the future (didn't think of this at all at the time!).

I've done this! A few notes:

BoolTensors are only in pytorch>=1.2, so I've also updated the requirements.txt and the README - @coarse-graining/developers, make sure your pytorches are updated too.
Even so, pytorch currently doesn't handle torch.eye(n, dtype) if dtype=torch.bool. This is currently being addressed in pytorch PR 24148 So there is a temporary fix in place that will handle this for now, and we can switch it over when pytorch catches up.
I've added a self.to_type(A, dtype) method is the same as A.type(dtype) in pytorch and A.astype(dtype) in numpy.
The self.tile(x, shape) method corresponds to np.tile(x, shape) in numpy and x.repeat(*shape) in pytorch. The numpy repeat method only handles shapes with dimension up to 2.
Because the equivalences are nontrivial here, I've added a new test file test_core_geometry that tests the equivalence of the neighbor calculations between numpy and pytorch.

@coarse-graining/developers let me know what you all think!

brookehus commented 5 years ago

Another question for @Dom1L:

You originally had this code:

        if cutoff is not None:
            if isinstance(distances, torch.Tensor):
                distances = distances.numpy()
            # Create an index mask for neighbors that are inside the cutoff
            neighbor_mask = (distances < cutoff).astype(np.float32)
            # Set the indices of beads outside the cutoff to 0
            neighbors[~neighbor_mask.astype(np.bool)] = 0

Here we change the mask to a float, and then change it to a bool to do the slicing, and return it as a float.

I kept it as a bool til the end of the logic, so it's returned as a float.

Just curious about why we want the mask as a float for returning? Is it because of not being able to use torch.BoolTensors pre 1.2 or is there another reason?

brookehus commented 5 years ago

Ok after talking with @nec4 I ALSO ADDED a method in Geometry that is check_array_vs_tensor which methods within Geometry can call before they do anything to make sure the data is of the right input. Right now all the relevant methods only have one relevant array/tensor so it doesn't iterate through a list but it's easy to change.

nec4 commented 5 years ago

Will take a look!

Dom1L commented 5 years ago

Yes makes much more sense to do the numpy/torch switch like this, I don't know why I wasn't thinking of this while implementing, sorry for that.

Regarding the float32 part, yes it's because of torch.BoolTensors not being available pre 1.2.
When the masking is done in the convolutional part: https://github.com/coarse-graining/cgnet/blob/234d989a23991f6abe539db4a76ce4e5fbdc6d27/cgnet/feature/feature.py#L278

it needs to be float32. Now that torch.BoolTensors are available it can be done with indexing and the whole float32 part is not necessary anymore.

brookehus commented 5 years ago

@Dom1L ok - so should we just merge this as is and then switch everything to BoolTensors later on?

Dom1L commented 5 years ago

Yes we can leave it as it is now and switch it later, maybe leave a TODO so we don't forget I leave the merging to @nec4

nec4 commented 5 years ago

Very thorough - LGTM!

coarse-graining / cgnet

make get_neighors agonstic to torch vs. numpy #93