AdvancedPhotonSource / tike

Repository for ptychography software
http://tike.readthedocs.io
Other
29 stars 15 forks source link

ENH: Fit solvers into GPU memory #46

Closed carterbox closed 4 years ago

carterbox commented 4 years ago

Purpose

The goals of this pull request have shifted somewhat. Originally, it was only to refactor the tike and libtike.cupy operators so that a "typical" reconstruction can fit into 11 GiB of GPU memory. However, this pull now includes moving the libtike.cupy operators into the tike.operator module. This is because since there is no compiling, having a separately distributed module is not necessary. Additionally, installation was confusing because other were not aware they need to install libtike to use GPUs.

For some estimated minimum memory requirements for various problem sizes below, the goal is to get GPU memory footprints to be as close to these estimates as possible.

# Siemen's Star
2.477 GiB for 6.73e+03 scans, 6.73e+03 frames, 128 detector, 128 probe, 500 object
2.479 GiB for multi-mode with 8 modes.
7.407 GiB for OPR.
2.505 GiB for multi-energy 8 energies.

# Others
0.7531 GiB for 1.85e+03 scans, 1.85e+03 frames, 128 detector, 128 probe, 1.3e+03 object
0.7548 GiB for multi-mode with 8 modes.
2.106 GiB for OPR.
0.9311 GiB for multi-energy 8 energies.

1.304 GiB for 193 scans, 193 frames, 512 detector, 512 probe, 1.9e+03 object
1.331 GiB for multi-mode with 8 modes.
3.554 GiB for OPR.
1.708 GiB for multi-energy 8 energies.

13.68 GiB for 9.28e+03 scans, 9.28e+03 frames, 256 detector, 256 probe, 1.3e+03 object
13.69 GiB for multi-mode with 8 modes.
40.88 GiB for OPR.
13.86 GiB for multi-energy 8 energies.

# not a mistake; it's 22 million scans
3.223e+04 GiB for 2.2e+07 scans, 2.2e+07 frames, 256 detector, 256 probe, 4.1e+03 object
3.223e+04 GiB for multi-mode with 8 modes.
9.668e+04 GiB for OPR.
3.223e+04 GiB for multi-energy 8 energies.

Approach

Pre-Merge Checklists

Submitter

Reviewer

Depends on: nikitinvv/libtike-cufft#24

carterbox commented 4 years ago

I observe 6.71 GiB being used for the Siemens star data set with one mode. That's 3x what is expected.

carterbox commented 4 years ago

The flow of data for the minimal memory footprint looks something like this:

There must be 2-3 gradients tracked for component that is being updated because conjugate gradient search directions depend on the previous search direction. However, the size of these gradients depends on a few factors:

  1. Are all incoherent modes / energies being updated simultaneously or sequentially?
  2. Are we using orthogonal probe relaxation or not? The affects the size of the probe array.
  3. Are we using a line search? This requires keeping an additional copy of the object and probe to try different updates.
pep8speaks commented 4 years ago

Hello @carterbox! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:

Comment last updated at 2020-05-04 16:33:57 UTC
nikitinvv commented 4 years ago

@carterbox is there a dependence on mpi4py?

nikitinvv commented 4 years ago

test_ptycho fails

(tike) bash-4.2$ python test_ptycho.py 
/home/beams/VNIKITIN/anaconda3/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
  return f(*args, **kwds)
EEEE.
======================================================================
ERROR: test_consistent_admm (__main__.TestPtychoRecon)
Check ptycho.solver.admm for consistency.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_ptycho.py", line 201, in test_consistent_admm
    self.template_consistent_algorithm('admm')
  File "test_ptycho.py", line 175, in template_consistent_algorithm
    recover_psi=True,
  File "/home/beams/VNIKITIN/anaconda3/lib/python3.7/site-packages/tike/ptycho/ptycho.py", line 185, in reconstruct
    **kwargs,
  File "/home/beams/VNIKITIN/anaconda3/lib/python3.7/site-packages/tike/ptycho/solvers/admm.py", line 47, in admm
    num_iter=cg_iter)  # yapf: disable
  File "/home/beams/VNIKITIN/anaconda3/lib/python3.7/site-packages/tike/ptycho/solvers/admm.py", line 99, in update_phase
    num_iter=num_iter,
  File "/home/beams/VNIKITIN/anaconda3/lib/python3.7/site-packages/tike/opt.py", line 103, in conjugate_gradient
    grad1 = grad(x)
  File "/home/beams/VNIKITIN/anaconda3/lib/python3.7/site-packages/tike/ptycho/solvers/admm.py", line 91, in grad
    return (op.propagation.grad(data, farplane) - ρ *
TypeError: _gaussian_grad() missing 1 required positional argument: 'intensity'
...
nikitinvv commented 4 years ago

Checking for out-of-field positions is not needed in _patch_iterator. Please, add extra psi padding at the beginning. if (ind[0] < 0 or ind[1] < 0 or psi_shape[-2] <= ind[0] + probe_shape or psi_shape[-1] <= ind[1] + probe_shape):

skip scans where the probe position overlaps edges

            # print(ind, rem)
            continue
        assert rem[0] >= 0 and rem[1] >= 0 
nikitinvv commented 4 years ago

@njit(parallel=True) does it work with cupy? switching to dask will address this problem?

nikitinvv commented 4 years ago

As I understand simply setting xp = cupy should port the code to gpu. Why some functions still have np. prefix: fwd,adj in propagation.py convolution.py

carterbox commented 4 years ago

Theses errors are because you are not using libtike-cupy and setting the environment variable TIKE_PTYCHO_BACKEND=cupy. Setting xp = cupy does not port the code because there is no numpy.asnumpy() function.

carterbox commented 4 years ago

@nikitinvv, these commits should unbreak the NumPy operators, but you still need to install libtike-cupy to use the CuPy operators.

carterbox commented 4 years ago

CI tests are failing because CuPy cannot be installed or imported on the CI.

nikitinvv commented 4 years ago

Now the flag is not TIKE_PTYCHO_BACKEND=cupy, it is TIKE_BACKEND=cupy

carterbox commented 4 years ago

That's correct.

nikitinvv commented 4 years ago

Would it be possible to have a test that reconstructs some relatively big data on GPU with recover_psi=True, recover_prb=True, recover_pos=True? For combined, divided, and admm? test_ptycho.py does not demonstrate functionality and users will not understand how to run the code.

nikitinvv commented 4 years ago

jupyter notebook numpy_cupy_compare is for the old version. It would be nice to update it

nikitinvv commented 4 years ago

Are the files in src/ directory (constants.py, tike.py,..) used somewhere? If you would like to keep them in tike repository then it is probably better to put them in some archive folder.

carterbox commented 4 years ago

Would it be possible to have a test that reconstructs some relatively big data on GPU with recover_psi=True, recover_prb=True, recover_pos=True? For combined, divided, and admm? test_ptycho.py does not demonstrate functionality and users will not understand how to run the code.

Teaching users to run the code is not the purpose of tests. There are examples in /docs/source/examples

jupyter notebook numpy_cupy_compare is for the old version. It would be nice to update it

:laughing: I forgot about that notebook! That's why it's so out of date. I have just been using the tests in tests/operators and checking that the numbers are the same. Do you think we still need numpy_cupy_compare?

Are the files in src/ directory (constants.py, tike.py,..) used somewhere? If you would like to keep them in tike repository then it is probably better to put them in some archive folder.

No. I will move stuff to /src/broken ?

nikitinvv commented 4 years ago

We still need an example for processing experimental data. Could you upload a jupyter notebook for reconstructing 1 catalyst data set? Yes, you can move stuff to src/broken

carterbox commented 4 years ago

The notebook would be broken unless you have the catalyst data set.

carterbox commented 4 years ago

@nikitinvv, Thanks for all of the feedback, but many of your comments are not within the scope of this pull-request. Once the objectives of the "purpose" section are met, this branch should be approved so that it can be merged and everyone else can update their branches with these changes. Other enhancements should have their own pull request. The point of git is that you make many incremental changes. You cannot fix everything in one branch.

nikitinvv commented 4 years ago

I prefer to give all my comments about the code while I am reading it. Please, move the comments that are not in the scope of this pull request to your next pull requests.

carterbox commented 4 years ago

I observe 6.71 GiB being used for the Siemens star data set with one mode

Updating to cupy==7.4 decreases the memory footprint to 5.89 GiB. I'm not sure there is much more memory optimization that can be done at the python level.