CannyLab / tsne-cuda

GPU Accelerated t-SNE for CUDA with Python bindings
BSD 3-Clause "New" or "Revised" License
1.8k stars 129 forks source link

Fail to install tsne-cuda in CUDA11.0 #95

Closed yushengsu-thu closed 3 years ago

yushengsu-thu commented 3 years ago

Do you provide a version for CUDA11.0?

monchin commented 3 years ago

I'm not a developper for this repository and I didn't try to install it in cuda11, but I just guess, if you install it from build.sh, maybe you can try to add the support for cuda 11 in it; or if you install by source, you can try to add newer sm and compute generation in CMakeLists.txt

LucaCappelletti94 commented 3 years ago

Do let me know if you had any luck with this!

tdvginz commented 3 years ago

I'll be happy with that also. I'm having

nvcc fatal   : Unsupported gpu architecture 'compute_30'

when trying to install from source

yushengsu-thu commented 3 years ago

Do you provide a version for CUDA11.0?

kernfel commented 3 years ago

@yushengsu-thu - did you close this because you've solved it, or because you've given up?

Personally, I got as far as @tdvginz, and I can report that that error arises from the faiss submodule, which comes from a fork (https://github.com/DavidMChan/faiss.git). Updating to the upstream master (https://github.com/facebookresearch/faiss) lets me build faiss on its own, but the CMake instructions here (that is, in tsne-cuda/dev) do something funky with the faiss build that doesn't work with the upstream master, and I don't understand enough about how CMake works to fix that.

Any help on getting CMake to cooperate and set up the (also CMake-based) faiss build would be welcome.

DavidMChan commented 3 years ago

I've recently updated the development branch (/dev) to support the latest version of FAISS v1.7, and CUDA 11.x. You will need to follow the FAISS instructions to make and install their GPU version. There are several hard-corded paths, so it may not transfer, but I'll work on updating those over time.

kernfel commented 3 years ago

Excellent, that appears to work, with nvcc>=11.2 (earlier versions, including 11.0, fail for not supporting compute_86). Thank you so much for the update!

DavidMChan commented 3 years ago

Ah, that's a good point. For reference (and people who need nvcc < 11.2), the code on line 65 of the CMakeLists.txt can be altered to remove any offending cuda architectures:

set(CUDA_ARCH
    -gencode=arch=compute_35,code=sm_35
    -gencode=arch=compute_50,code=sm_50
    -gencode=arch=compute_52,code=sm_52
    -gencode=arch=compute_60,code=sm_60
    -gencode=arch=compute_61,code=sm_61
    -gencode=arch=compute_70,code=sm_70
    -gencode=arch=compute_75,code=sm_75
    -gencode=arch=compute_86,code=sm_86 # If you don't need this, comment it or remove it.
)
kernfel commented 3 years ago

Spoke too early, toy examples work, but with larger sample numbers, faiss fails as in https://github.com/facebookresearch/faiss/issues/1793. Trying to work around it with older faiss versions right now, but no luck yet.

DavidMChan commented 3 years ago

Hmm, it does look like that's upstream to us. Unfortunately I can't test that 1.6.5 works right now, since FAISS@1.6.5 doesn't support the compute_86, which is required on my machine, however it does seem like tsnecuda compiles..... (no changes required to tsnecuda, just compile and install the right version of faiss) so it might be worth testing.

kernfel commented 3 years ago

Building with 1.6.4 and 1.6.5 works, but running it doesn't:

$ ./build/tsne 
Starting TSNE calculation with 5000 points.
Initializing cuda handles... GPUassert: the provided PTX was compiled with an unsupported toolchain. /home/felix/projects/lib/tsne-cuda/src/util/cuda_utils.cu 55

I think that might still be a faiss issue, since I remember seeing a lot of ptx build output on 1.7 that doesn't happen on 1.6.* builds.

DavidMChan commented 3 years ago

Let's move this conversation to a new issue, since it doesn't seem directly related to CUDA 11 support. We can always circle back if necessary.

kernfel commented 3 years ago

To update here: I've managed to build & run with the following:

LiUzHiAn commented 3 years ago

To update here: I've managed to build & run with the following:

  • CUDA toolkit v11.0.221 (from Ubuntu repos)
  • GCC v9.3.0 (comes with nvidia-cuda-toolkit, but needs to be added to update-alternatives manually in Ubuntu 20.10)
  • FAISS v1.6.5 (built from source)
  • Excluding compute_86 from CMakeLists.txt as noted above.

Hi,

Thanks for your sharing.

I've downloaded the FAISS v1.6.5 source code and have built it from source successfully. Then I tried to build the tsne-cuda, but met the following error:

CMake Error at tsnecuda_generated_Distance.cu.o.cmake:216 (message):
  Error generating
  /mnt/hdd4T/lza_home/py_projects/tsne-cuda/build/CMakeFiles/tsnecuda.dir/third_party/faiss/gpu/impl/./tsnecuda_generated_Distance.cu.o

CMakeFiles/tsnecuda.dir/build.make:117: recipe for target 'CMakeFiles/tsnecuda.dir/third_party/faiss/gpu/impl/tsnecuda_generated_Distance.cu.o' failed
make[2]: *** [CMakeFiles/tsnecuda.dir/third_party/faiss/gpu/impl/tsnecuda_generated_Distance.cu.o] Error 1
CMakeFiles/Makefile2:247: recipe for target 'CMakeFiles/tsnecuda.dir/all' failed
make[1]: *** [CMakeFiles/tsnecuda.dir/all] Error 2
Makefile:155: recipe for target 'all' failed
make: *** [all] Error 2

But I found the Distance.cuh was already installed in my machine when building FAISS from source, here is the log:

...
-- Installing: /usr/local/include/faiss/gpu/StandardGpuResources.h
-- Installing: /usr/local/include/faiss/gpu/impl/BinaryDistance.cuh
-- Installing: /usr/local/include/faiss/gpu/impl/BinaryFlatIndex.cuh
-- Installing: /usr/local/include/faiss/gpu/impl/BroadcastSum.cuh
-- Installing: /usr/local/include/faiss/gpu/impl/Distance.cuh
-- Installing: /usr/local/include/faiss/gpu/impl/DistanceUtils.cuh
-- Installing: /usr/local/include/faiss/gpu/impl/FlatIndex.cuh
...

It seems that the tsne-cuda compiles some files related FAISS again? Any ideas what to do next?

Thanks in advance.

kernfel commented 3 years ago

Was there no additional information from CMake as to what caused the build failure in tsnecuda_generated_Distance.cu.o.cmake?

DavidMChan commented 3 years ago

If you’re using the main branch, and not the dev branch, did you remember to run ‘git submodule init’ and ‘git submodule update’ ?

Sent via Superhuman iOS ( @.*** )

On Wed, Jul 7 2021 at 21:52, Felix Benjamin Kern < @.*** > wrote:

Was there no additional information from CMake as to what caused the build failure in tsnecuda_generated_Distance.cu.o.cmake ?

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub ( https://github.com/CannyLab/tsne-cuda/issues/95#issuecomment-876123547 ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/AAYK3IVKKRQIYHB4PHKPY5TTWUVH5ANCNFSM4WQLSEBQ ).

LiUzHiAn commented 3 years ago

Was there no additional information from CMake as to what caused the build failure in tsnecuda_generated_Distance.cu.o.cmake?

I can only fine there three lines as I posted before:

CMake Error at tsnecuda_generated_Distance.cu.o.cmake:216 (message):
  Error generating
  /mnt/hdd4T/lza_home/py_projects/tsne-cuda/build/CMakeFiles/tsnecuda.dir/third_party/faiss/gpu/impl/./tsnecuda_generated_Distance.cu.o
LiUzHiAn commented 3 years ago

If you’re using the main branch, and not the dev branch, did you remember to run ‘git submodule init’ and ‘git submodule update’ ? Sent via Superhuman iOS ( @. ) On Wed, Jul 7 2021 at 21:52, Felix Benjamin Kern < @. > wrote: Was there no additional information from CMake as to what caused the build failure in tsnecuda_generated_Distance.cu.o.cmake ? — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub ( #95 (comment) ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/AAYK3IVKKRQIYHB4PHKPY5TTWUVH5ANCNFSM4WQLSEBQ ).

Hi,

Indeed, due to some network issues, the git submodule update failed to fetch some files. But after fixing this problem, another error was raised:

[ 35%] Building NVCC (Device) object CMakeFiles/tsnecuda.dir/src/util/tsnecuda_generated_math_utils.cu.o
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
In file included from /usr/local/cuda-11.0/include/thrust/detail/config/config.h:27:0,
                 from /usr/local/cuda-11.0/include/thrust/detail/config.h:23,
                 from /usr/local/cuda-11.0/include/thrust/host_vector.h:24,
                 from /mnt/hdd4T/lza_home/py_projects/tsne-cuda/src/include/common.h:21,
                 from /mnt/hdd4T/lza_home/py_projects/tsne-cuda/src/include/util/math_utils.h:14,
                 from /mnt/hdd4T/lza_home/py_projects/tsne-cuda/src/util/math_utils.cu:10:
/usr/local/cuda-11.0/include/thrust/detail/config/cpp_dialect.h:104:13: warning: Thrust requires C++14. Please pass -std=c++14 to your compiler. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.
   THRUST_COMPILER_DEPRECATION(C++14, pass -std=c++14 to your compiler);
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                                                            
In file included from /usr/local/cuda-11.0/include/cub/util_arch.cuh:36:0,
                from /mnt/hdd4T/lza_home/py_projects/tsne-cuda/src/include/util/math_utils.h:14,
                 from /mnt/hdd4T/lza_home/py_projects/tsne-cuda/src/util/math_utils.cu:10:
                 ...
/usr/local/cuda-11.0/include/cub/util_cpp_dialect.cuh:129:13: warning: CUB requires C++14. Please pass -std=c++14 to your compiler. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.
   CUB_COMPILER_DEPRECATION(C++14, pass -std=c++14 to your compiler);
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                                                         
/mnt/hdd4T/lza_home/py_projects/tsne-cuda/src/util/math_utils.cu(153): error: identifier "cusparseScsr2csc" is undefined

/mnt/hdd4T/lza_home/py_projects/tsne-cuda/src/util/math_utils.cu(165): error: identifier "cusparseXcsrgeamNnz" is undefined

/mnt/hdd4T/lza_home/py_projects/tsne-cuda/src/util/math_utils.cu(195): error: identifier "cusparseScsrgeam" is undefined

3 errors detected in the compilation of "/mnt/hdd4T/lza_home/py_projects/tsne-cuda/src/util/math_utils.cu".
CMake Error at tsnecuda_generated_math_utils.cu.o.cmake:276 (message):
  Error generating file
  /mnt/hdd4T/lza_home/py_projects/tsne-cuda/build/CMakeFiles/tsnecuda.dir/src/util/./tsnecuda_generated_math_utils.cu.o

CMakeFiles/tsnecuda.dir/build.make:565: recipe for target 'CMakeFiles/tsnecuda.dir/src/util/tsnecuda_generated_math_utils.cu.o' failed
make[2]: *** [CMakeFiles/tsnecuda.dir/src/util/tsnecuda_generated_math_utils.cu.o] Error 1
CMakeFiles/Makefile2:247: recipe for target 'CMakeFiles/tsnecuda.dir/all' failed
make[1]: *** [CMakeFiles/tsnecuda.dir/all] Error 2
Makefile:155: recipe for target 'all' failed
make: *** [all] Error 2

I also tried to build from dev branch, which throws another error:

-- The CUDA compiler identification is NVIDIA 11.0.194
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda-11.0/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Found CUDAToolkit: /usr/local/cuda-11.0/include (found version "11.0.194") 
-- Found OpenMP_C: -fopenmp (found version "4.5") 
-- Found OpenMP_CXX: -fopenmp (found version "4.5") 
CMake Error at CMakeLists.txt:139 (find_package):
  By not providing "Findgflags.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "gflags", but
  CMake did not find one.

  Could not find a package configuration file provided by "gflags" with any
  of the following names:

    gflagsConfig.cmake
    gflags-config.cmake

  Add the installation prefix of "gflags" to CMAKE_PREFIX_PATH or set
  "gflags_DIR" to a directory containing one of the above files.  If "gflags"
  provides a separate development package or SDK, be sure it has been
  installed.

-- Configuring incomplete, errors occurred!
See also "/mnt/hdd4T/lza_home/py_projects/tsne-cuda/build/CMakeFiles/CMakeOutput.log".
See also "/mnt/hdd4T/lza_home/py_projects/tsne-cuda/build/CMakeFiles/CMakeError.log".
DavidMChan commented 3 years ago

Yes, in CUDA 11.0, we had to rewrite several pieces of code to deal with the fact that cusparseScsr2csc, cusparseXcsrgeamNnz, and cusparseScsrgeam were removed. Thus, for cuda 11.0, you have to use the dev/ branch for the time being (until I can get everything released - if anybody would be willing to help out getting this code to work consistently on anaconda, I'd really appreciate it, since I'm having a lot of trouble getting anaconda and cmake to play nicely with CUDA).

The issue with GFlags is the same as in #99 - and I honestly don't know what's causing that, since we have a Findgflags file in our CMake config directory. Does specifying the installation prefix for gflags directly work?

DavidMChan commented 3 years ago

After testing, this should be fixed in 3.0.0. Feel free to reopen if you're still having issues.

LiUzHiAn commented 3 years ago

Yes, in CUDA 11.0, we had to rewrite several pieces of code to deal with the fact that cusparseScsr2csc, cusparseXcsrgeamNnz, and cusparseScsrgeam were removed. Thus, for cuda 11.0, you have to use the dev/ branch for the time being (until I can get everything released - if anybody would be willing to help out getting this code to work consistently on anaconda, I'd really appreciate it, since I'm having a lot of trouble getting anaconda and cmake to play nicely with CUDA).

The issue with GFlags is the same as in #99 - and I honestly don't know what's causing that, since we have a Findgflags file in our CMake config directory. Does specifying the installation prefix for gflags directly work?

This is due to the gflags was not installed in my machine. After installing it, the error was gone.


@DavidMChan,@kernfel

Thank you soooooooo much for your help. Eventually, I install the faiss and tsnecuda successfully.

Let me make a detailed summary here for those who want to build the tsnecuda in CUDA11.0, based on kernfel's solution. The CMake above 3.15 is recommended, basically, you have to do the following two procedures :

  1. Download the faiss-1.6.5 source code and build from source, basically, you have to follow the step1-3 here. After finishing the faiss compiling process, DON'T forget to install the generate python package;
  2. Clone this repo and checkout to dev branch and build from source as here. Also, install the generated python tsnecuda package.
LiUzHiAn commented 3 years ago

To be honest, I install the tsnecuda in two machines, the first is RTX 2060 GPU, cuda10.0, python 3.6, the second is RTX 3090 GPU, cuda11.0, python 3.6. I tested the performance via tsnecuda.test() while the result exceeded my expectations.

The speed of 3090 is even much slower (~20x) than that of 2060, here is the output: In my 2060 GPU:

>>> tsnecuda.test()
Initializing cuda handles... done.
KNN Computation... done.
Computing Pij matrix... done.
Initializing low dim points... done.
Initializing CUDA memory... done.
[Step 0] Avg. Gradient Norm: 5.63507e-05
[Step 10] Avg. Gradient Norm: 2.63374e-06
[Step 20] Avg. Gradient Norm: 1.35403e-07
[Step 30] Avg. Gradient Norm: 6.75821e-09
[Step 40] Avg. Gradient Norm: 3.32646e-10
[Step 50] Avg. Gradient Norm: 1.76006e-11
[Step 60] Avg. Gradient Norm: 3.43814e-11
[Step 70] Avg. Gradient Norm: 9.26503e-12
[Step 80] Avg. Gradient Norm: 1.28685e-11
[Step 90] Avg. Gradient Norm: 2.21792e-10
[Step 100] Avg. Gradient Norm: 8.70341e-12
[Step 110] Avg. Gradient Norm: 2.06272e-10
[Step 120] Avg. Gradient Norm: 2.0891e-11
[Step 130] Avg. Gradient Norm: 1.84418e-11
[Step 140] Avg. Gradient Norm: 3.2101e-11
[Step 150] Avg. Gradient Norm: 2.1261e-11
[Step 160] Avg. Gradient Norm: 2.1119e-11
[Step 170] Avg. Gradient Norm: 2.31917e-11
[Step 180] Avg. Gradient Norm: 3.68962e-11
[Step 190] Avg. Gradient Norm: 1.20611e-11
[Step 200] Avg. Gradient Norm: 1.93969e-11
[Step 210] Avg. Gradient Norm: 2.00171e-10
[Step 220] Avg. Gradient Norm: 4.73396e-11
[Step 230] Avg. Gradient Norm: 7.42355e-12
[Step 240] Avg. Gradient Norm: 5.25284e-11
[Step 250] Avg. Gradient Norm: 1.23045e-10
[Step 260] Avg. Gradient Norm: 7.37647e-12
[Step 270] Avg. Gradient Norm: 2.08948e-11
[Step 280] Avg. Gradient Norm: 2.85766e-11
[Step 290] Avg. Gradient Norm: 2.77974e-10
[Step 300] Avg. Gradient Norm: 5.04055e-09
[Step 310] Avg. Gradient Norm: 1.57394e-07
[Step 320] Avg. Gradient Norm: 7.9907e-06
[Step 330] Avg. Gradient Norm: 0.000629361
[Step 340] Avg. Gradient Norm: 0.0377374
[Step 350] Avg. Gradient Norm: 0.0864466
[Step 360] Avg. Gradient Norm: 0.0287763
[Step 370] Avg. Gradient Norm: 0.0137643
[Step 380] Avg. Gradient Norm: 0.0101956
[Step 390] Avg. Gradient Norm: 0.0087624
[Step 400] Avg. Gradient Norm: 0.00821406
[Step 410] Avg. Gradient Norm: 0.00798948
[Step 420] Avg. Gradient Norm: 0.00782703
[Step 430] Avg. Gradient Norm: 0.00759733
[Step 440] Avg. Gradient Norm: 0.00723665
[Step 450] Avg. Gradient Norm: 0.00683833
[Step 460] Avg. Gradient Norm: 0.00651148
[Step 470] Avg. Gradient Norm: 0.00610025
[Step 480] Avg. Gradient Norm: 0.00571612
[Step 490] Avg. Gradient Norm: 0.00551033
[Step 500] Avg. Gradient Norm: 0.00537704
[Step 510] Avg. Gradient Norm: 0.00524043
[Step 520] Avg. Gradient Norm: 0.00504568
[Step 530] Avg. Gradient Norm: 0.00492819
[Step 540] Avg. Gradient Norm: 0.00478753
[Step 550] Avg. Gradient Norm: 0.00470573
[Step 560] Avg. Gradient Norm: 0.00459653
[Step 570] Avg. Gradient Norm: 0.00436502
[Step 580] Avg. Gradient Norm: 0.00405163
[Step 590] Avg. Gradient Norm: 0.00387847
[Step 600] Avg. Gradient Norm: 0.00363504
[Step 610] Avg. Gradient Norm: 0.00345075
[Step 620] Avg. Gradient Norm: 0.00329351
[Step 630] Avg. Gradient Norm: 0.00311661
[Step 640] Avg. Gradient Norm: 0.00300835
[Step 650] Avg. Gradient Norm: 0.00292016
[Step 660] Avg. Gradient Norm: 0.00294263
[Step 670] Avg. Gradient Norm: 0.00279009
[Step 680] Avg. Gradient Norm: 0.00259829
[Step 690] Avg. Gradient Norm: 0.00243013
[Step 700] Avg. Gradient Norm: 0.00230396
[Step 710] Avg. Gradient Norm: 0.00233775
[Step 720] Avg. Gradient Norm: 0.00243892
[Step 730] Avg. Gradient Norm: 0.00235893
[Step 740] Avg. Gradient Norm: 0.00226121
[Step 750] Avg. Gradient Norm: 0.00221478
[Step 760] Avg. Gradient Norm: 0.00214333
[Step 770] Avg. Gradient Norm: 0.00206614
[Step 780] Avg. Gradient Norm: 0.00189938
[Step 790] Avg. Gradient Norm: 0.00182071
[Step 800] Avg. Gradient Norm: 0.00183494
[Step 810] Avg. Gradient Norm: 0.00193397
[Step 820] Avg. Gradient Norm: 0.00196122
[Step 830] Avg. Gradient Norm: 0.00184061
[Step 840] Avg. Gradient Norm: 0.00170407
[Step 850] Avg. Gradient Norm: 0.00157969
[Step 860] Avg. Gradient Norm: 0.00138117
[Step 870] Avg. Gradient Norm: 0.00128773
[Step 880] Avg. Gradient Norm: 0.00123935
[Step 890] Avg. Gradient Norm: 0.00125743
[Step 900] Avg. Gradient Norm: 0.00112275
[Step 910] Avg. Gradient Norm: 0.00101219
[Step 920] Avg. Gradient Norm: 0.00107188
[Step 930] Avg. Gradient Norm: 0.00108749
[Step 940] Avg. Gradient Norm: 0.0011048
[Step 950] Avg. Gradient Norm: 0.00110982
[Step 960] Avg. Gradient Norm: 0.0010239
[Step 970] Avg. Gradient Norm: 0.00101843
[Step 980] Avg. Gradient Norm: 0.00103544
[Step 990] Avg. Gradient Norm: 0.00103231
_time_initialization: 0.0004s
_time_knn: 0.114729s
_time_symmetry: 0.035447s
_time_init_low_dim: 0.000495s
_time_init_fft: 0.001357s
_time_compute_charges: 0.002109s
_time_precompute_2d: 0.139967s
_time_nbodyfft: 0.172617s
_time_norm: 0.027826s
_time_attr: 0.07049s
_time_apply_forces: 0.064584s
_time_other: 0.005371s
total_time: 0.635392s
>>> 

while in my 3090 GPU:

>>> tsnecuda.test()
Initializing cuda handles... done.
KNN Computation... done.
Computing Pij matrix... 
done.
Initializing low dim points... done.
Initializing CUDA memory... done.
[Step 0] Avg. Gradient Norm: 0.00316841
[Step 10] Avg. Gradient Norm: 0.00020235
[Step 20] Avg. Gradient Norm: 1.17639e-05
[Step 30] Avg. Gradient Norm: 6.08587e-07
[Step 40] Avg. Gradient Norm: 3.33705e-08
[Step 50] Avg. Gradient Norm: 2.13425e-09
[Step 60] Avg. Gradient Norm: 1.352e-09
[Step 70] Avg. Gradient Norm: 7.24622e-09
[Step 80] Avg. Gradient Norm: 3.61282e-09
[Step 90] Avg. Gradient Norm: 8.77663e-10
[Step 100] Avg. Gradient Norm: 8.04638e-10
[Step 110] Avg. Gradient Norm: 8.40385e-10
[Step 120] Avg. Gradient Norm: 1.27527e-09
[Step 130] Avg. Gradient Norm: 4.91896e-09
[Step 140] Avg. Gradient Norm: 8.725e-10
[Step 150] Avg. Gradient Norm: 1.09649e-08
[Step 160] Avg. Gradient Norm: 1.72673e-08
[Step 170] Avg. Gradient Norm: 4.03674e-09
[Step 180] Avg. Gradient Norm: 1.41993e-09
[Step 190] Avg. Gradient Norm: 3.52255e-09
[Step 200] Avg. Gradient Norm: 8.46993e-09
[Step 210] Avg. Gradient Norm: 6.05243e-09
[Step 220] Avg. Gradient Norm: 5.35294e-09
[Step 230] Avg. Gradient Norm: 4.24595e-09
[Step 240] Avg. Gradient Norm: 2.89514e-09
[Step 250] Avg. Gradient Norm: 9.4303e-09
[Step 260] Avg. Gradient Norm: 4.8251e-09
[Step 270] Avg. Gradient Norm: 3.36828e-09
[Step 280] Avg. Gradient Norm: 1.96735e-08
[Step 290] Avg. Gradient Norm: 2.80626e-07
[Step 300] Avg. Gradient Norm: 6.88397e-06
[Step 310] Avg. Gradient Norm: 0.000276592
[Step 320] Avg. Gradient Norm: 0.0173417
[Step 330] Avg. Gradient Norm: 1.10052
[Step 340] Avg. Gradient Norm: 8.72256
[Step 350] Avg. Gradient Norm: 3.15913
[Step 360] Avg. Gradient Norm: 1.69793
[Step 370] Avg. Gradient Norm: 3.2298
[Step 380] Avg. Gradient Norm: 9.03506
[Step 390] Avg. Gradient Norm: 4.04239
[Step 400] Avg. Gradient Norm: 2.67969
[Step 410] Avg. Gradient Norm: 2.43034
[Step 420] Avg. Gradient Norm: 2.35848
[Step 430] Avg. Gradient Norm: 2.36371
[Step 440] Avg. Gradient Norm: 2.40016
[Step 450] Avg. Gradient Norm: 2.45785
[Step 460] Avg. Gradient Norm: 2.52751
[Step 470] Avg. Gradient Norm: 2.61023
[Step 480] Avg. Gradient Norm: 2.71442
[Step 490] Avg. Gradient Norm: 2.78857
[Step 500] Avg. Gradient Norm: 2.84582
[Step 510] Avg. Gradient Norm: 2.88569
[Step 520] Avg. Gradient Norm: 2.94037
[Step 530] Avg. Gradient Norm: 3.00796
[Step 540] Avg. Gradient Norm: 3.0736
[Step 550] Avg. Gradient Norm: 3.10501
[Step 560] Avg. Gradient Norm: 3.13196
[Step 570] Avg. Gradient Norm: 3.13702
[Step 580] Avg. Gradient Norm: 3.16519
[Step 590] Avg. Gradient Norm: 3.19187
[Step 600] Avg. Gradient Norm: 3.21807
[Step 610] Avg. Gradient Norm: 3.24802
[Step 620] Avg. Gradient Norm: 3.26183
[Step 630] Avg. Gradient Norm: 3.26732
[Step 640] Avg. Gradient Norm: 3.26586
[Step 650] Avg. Gradient Norm: 3.2619
[Step 660] Avg. Gradient Norm: 3.25324
[Step 670] Avg. Gradient Norm: 3.25705
[Step 680] Avg. Gradient Norm: 3.27121
[Step 690] Avg. Gradient Norm: 3.28956
[Step 700] Avg. Gradient Norm: 3.2974
[Step 710] Avg. Gradient Norm: 3.30126
[Step 720] Avg. Gradient Norm: 3.31402
[Step 730] Avg. Gradient Norm: 3.3251
[Step 740] Avg. Gradient Norm: 3.32682
[Step 750] Avg. Gradient Norm: 3.33112
[Step 760] Avg. Gradient Norm: 3.34129
[Step 770] Avg. Gradient Norm: 3.35972
[Step 780] Avg. Gradient Norm: 3.37386
[Step 790] Avg. Gradient Norm: 3.38687
[Step 800] Avg. Gradient Norm: 3.39692
[Step 810] Avg. Gradient Norm: 3.3988
[Step 820] Avg. Gradient Norm: 3.40635
[Step 830] Avg. Gradient Norm: 3.41817
[Step 840] Avg. Gradient Norm: 3.43308
[Step 850] Avg. Gradient Norm: 3.44227
[Step 860] Avg. Gradient Norm: 3.44587
[Step 870] Avg. Gradient Norm: 3.45133
[Step 880] Avg. Gradient Norm: 3.4623
[Step 890] Avg. Gradient Norm: 3.47085
[Step 900] Avg. Gradient Norm: 3.48198
[Step 910] Avg. Gradient Norm: 3.48978
[Step 920] Avg. Gradient Norm: 3.49834
[Step 930] Avg. Gradient Norm: 3.51401
[Step 940] Avg. Gradient Norm: 3.52248
[Step 950] Avg. Gradient Norm: 3.52983
[Step 960] Avg. Gradient Norm: 3.53699
[Step 970] Avg. Gradient Norm: 3.52919
[Step 980] Avg. Gradient Norm: 3.51682
[Step 990] Avg. Gradient Norm: 3.49306
_time_initialization: 0.002144s
_time_knn: 0.411477s
_time_symmetry: 0.619355s
_time_init_low_dim: 0.000649s
_time_init_fft: 0.00359s
_time_compute_charges: 0.002199s
_time_precompute_2d: 1.98696s
_time_nbodyfft: 5.07265s
_time_norm: 0.565026s
_time_attr: 1.47092s
_time_apply_forces: 1.42147s
_time_other: 0.904033s
total_time: 12.4605s

Let me open a new issue (#100) to discuss.