fujiisoup / py3nj

Wigner's 3J, 6J, 9J symbols for python
https://py3nj.readthedocs.io/
Apache License 2.0
18 stars 5 forks source link

Vectorization for matrix input for w3j #15

Closed SergeiDBykov closed 3 years ago

SergeiDBykov commented 3 years ago

Hi! I need to find the following matrix:

Screenshot 2021-03-19 at 12 06 25

where the prefactor B_p(2p+1) before Wigner 3j symbol is known.

Of course I could create such a matrix in a double loop over l and l' , which both are np.arange(0, some_number), but this is extremely slow (on my laptop of took almost 2 hour for some_number = 512). Note that the matrix is symmetric due to the square of 3j symbol, and the indices inside 3j are interchangable.

I recently noticed that the p3nj package supports vectorization and checked that it is much faster than iteration over indeces for 1d case.

But I can't understand whether you vectorization supports matrices. Do you have any idea how to vectorize this matrix? I.e. create a matrix K[l,l'] in an efficient way (probably also taking advantage of symmetry and getting only upper-triangular matrix).

This is how I done loops (the integration over j, this plays a role of p in the formula above):

import numpy as np
from py3nj import wigner3j

lmax = 512
Bj = np.ones(lmax)  # some array with 512 elements
K = np.zeros((lmax, lmax))
js = np.arange(len(Bj))

for l1 in progressbar(range(lmax)):
    for l2 in range(l1+1):
        array = np.array(
            [
                (2 * j + 1) * Bj[j] *
                np.abs(wigner3j(l1 * 2, l2 * 2, j * 2, 0, 0, 0)) ** 2
                for j in js
            ]
        )

        elem = np.sum(array)
        K[l2, l1] = elem
        K[l1, l2] = elem

As I said, its very inefficient.

Thank you!

fujiisoup commented 3 years ago

Hi @SergeiDBykov

Thank you for raising an issue. How about the following?

import numpy as np
import py3nj

n = 16

B = np.random.randn(n)
two_p = np.arange(0, 2 * n, 2)
K = np.sum(
  (two_p + 1) * B * py3nj.wigner3j(
    two_p[:, np.newaxis, np.newaxis], two_p, two_p[:, np.newaxis], 
    0, 0, 0)**2, 
  axis=-1)

It works for small n. How ever, I have a trouble with large n, maybe because it requires a huge amount of memory.

SergeiDBykov commented 3 years ago

Dear @fujiisoup thanks, it does work! Little report:

# %% vectorized wigner 3j (https://github.com/fujiisoup/py3nj/issues/15)

def matrix_element(l1, l2, Bj):
    js = np.arange(len(Bj))
    array = np.array([(2 * j + 1) * Bj[j] *
                      (wigner3j(l1 * 2, l2 * 2, j * 2, 0, 0, 0)) ** 2
                      for j in js
                      ]
                     )

    elem = np.sum(array)
    return elem

def matrix(Bp):
    n = len(Bp)
    two_p = np.arange(0, 2 * n, 2)
    K = np.sum(
        (two_p + 1) * Bp * wigner3j(
            two_p[:, np.newaxis, np.newaxis], two_p, two_p[:, np.newaxis],
            0, 0, 0)**2,
        axis=-1)
    return K

lmax = 16
Bj = np.random.rand(lmax)  # some array

K = matrix(Bj)

for l1 in range(lmax):
    for l2 in range(lmax):
        print(
            f"K[l1, l2] = {K[l1, l2]}, elemnt = {matrix_element(l1, l2, Bj)}")
        assert K[l1, l2] == matrix_element(l1, l2, Bj), 'Fail'

My MacBook with I7 and 16 Gb RAM: %timeit matrix(Bj) >>> 951 µs ± 46.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%time K = matrix(np.random.rand(128))

>>> CPU times: user 13.5 s, sys: 6.51 s, total: 20.1 s
Wall time: 21.9 s

If you want, I may try to measure memory usage (but don't know how, so let me know if you are interested).

Thanks.

SergeiDBykov commented 3 years ago

Update: Indeed, at lmax = 256 the system cannot longer allocate memory to the arrays zsh: killed ipython I guess its understandable because w3j takes a 3d arrray in itself and then sums it. I also think that even the symmetry of matrix may not help, the arrays are too large and factor of 2 reducing in independent elements is not a solution. Need to do something to alleviate memory usage. If I figure something out I will report it here.

May be try to do the trick with new dimensions, but iterating vectorizing ROWS? I.e. some function that returns K[l1,l2] for fixed l1 and running l2 over range? It would be basically the same as @fujiisoup proposal, but with 2d dimension instead of 3d. And then one may iterate over rows to fill a matrix, which may be parallelised or whatever.

Thanks.

SergeiDBykov commented 3 years ago

Update 2: due to large memory usage, '3d' solution (https://github.com/fujiisoup/py3nj/issues/15#issuecomment-802827858) is hard to implement in real life

I tried vectorize row of K matrix (function matrix_row(l1, Bp)):


import numpy as np
from py3nj import wigner3j

def matrix_element(l1, l2, Bj):
    js = np.arange(len(Bj))
    array = np.array([(2 * j + 1) * Bj[j] *
                      (wigner3j(l1 * 2, l2 * 2, j * 2, 0, 0, 0)) ** 2
                      for j in js
                      ]
                     )

    elem = np.sum(array)
    return elem

def matrix(Bp):
    n = len(Bp)
    two_p = np.arange(0, 2 * n, 2)
    K = np.sum(
        (two_p + 1) * Bp * wigner3j(
            two_p[:, np.newaxis, np.newaxis], two_p, two_p[:, np.newaxis],
            0, 0, 0)**2,
        axis=-1)
    return K

lmax = 16
Bj = np.random.rand(lmax)  # some array

K = matrix(Bj)

for l1 in range(lmax):
    for l2 in range(lmax):
        print(
            f"K[l1, l2] = {K[l1, l2]}, elemnt = {matrix_element(l1, l2, Bj)}")
        assert K[l1, l2] == matrix_element(l1, l2, Bj), 'Fail'

def matrix_row(l1, Bp):
    # row l1, runs secind index l2s
    n = len(Bp)
    two_p = np.arange(0, 2 * n, 2)

    Krow = np.sum(
        (two_p + 1) * Bp * wigner3j(
            2*l1, two_p, two_p[:, np.newaxis],
            0, 0, 0)**2,
        axis=-1)
    return Krow

for l1 in range(lmax):
    assert np.all(K[l1] == matrix_row(l1, Bp))

on my machine %timeit matrix_row(45, np.random.rand(256) 1.88 s ± 94.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) i.e. approx 8 mins for full matrix (ignoring its symmetry).

@fujiisoup can you please check if my function is correct?

According to my system monitor, for '3d' solution memory usage achieves a few tens of GB, while calculating rows uses significantly less memory.

fujiisoup commented 3 years ago

Hi @SergeiDBykov

Thank you for your update. According to the py3nj structure, the column base function should run faster,

def matrix_col(l2, Bp):
    # column l2, runs secind index l1s
    n = len(Bp)
    two_p = np.arange(0, 2 * n, 2)

    Kcol = np.sum(
        (two_p + 1) * Bp * wigner3j(
            two_p[:, np.newaxis], two_p, 2 * l2,
            0, 0, 0)**2,
        axis=-1)
    return Kcol

In [10]: %timeit matrix_row(45, np.random.rand(256))
3.09 s ± 132 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [11]: %timeit matrix_col(45, np.random.rand(256))
350 ms ± 47.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
SergeiDBykov commented 3 years ago

Hey @fujiisoup Thank you for the hint, I did not know that.

It is indeed faster than iterating rows.

checking the correctness for vectorization:

import numpy as np
from py3nj import wigner3j

def matrix_element(l1, l2, Bp):
    js = np.arange(len(Bp))
    array = np.array([(2 * j + 1) * Bp[j] *
                      (wigner3j(l1 * 2, l2 * 2, j * 2, 0, 0, 0)) ** 2
                      for j in js
                      ]
                     )
    elem = np.sum(array)
    return elem

def matrix_per_column(Bp):

    def matrix_col(l2, Bp):
        # column l2, runs first index l1s
        n = len(Bp)
        two_p = np.arange(0, 2 * n, 2)

        Kcol = np.sum(
            (two_p + 1) * Bp * wigner3j(
                two_p[:, np.newaxis], two_p, 2 * l2,
                0, 0, 0)**2,
            axis=-1)
        return Kcol
    n = len(Bp)
    l2s = np.arange(n)

    K = np.zeros((n, n))

    for l2 in progressbar(l2s, prefix='iterating rows'):
        col = matrix_col(l2, Bp)
        K[:, l2] = col

    return K

lmax = 64
Bp = np.random.rand(lmax)

K = matrix_per_column(Bp)

for l1 in progressbar(range(lmax)):
    for l2 in range(lmax):
        assert np.abs(K[l1, l2] - matrix_element(l1, l2, Bp)) < 1e-10

Thank, it is much faster than direct iteration.