MathEXLab / PySPOD

A Python package for spectral proper orthogonal decomposition (SPOD).
https://mathexlab.github.io/PySPOD/
MIT License
101 stars 30 forks source link

Weights shape error #45

Closed FrankFrank9 closed 3 weeks ago

FrankFrank9 commented 5 months ago

Hello,

when I run in parallel the SPOD with a custom weighting matrix (area of the elements) I get the following error but everything is fine when I run in serial mode. Do you have any idea on that?

 ValueError: parameter ``weights`` must be cast into 1d array with dimension equal to flattened spatial dimension of data.
    raise ValueError(
    raise ValueError(
ValueError: parameter ``weights`` must be cast into 1d array with dimension equal to flattened spatial dimension of data.
    raise ValueError(
ValueError: parameter ``weights`` must be cast into 1d array with dimension equal to flattened spatial dimension of data.
    raise ValueError(
ValueError: parameter ``weights`` must be cast into 1d array with dimension equal to flattened spatial dimension of data.
    raise ValueError(
ValueError: parameter ``weights`` must be cast into 1d array with dimension equal to flattened spatial dimension of data.
    raise ValueError(
ValueError: parameter ``weights`` must be cast into 1d array with dimension equal to flattened spatial dimension of data.
    raise ValueError(
ValueError: parameter ``weights`` must be cast into 1d array with dimension equal to flattened spatial dimension of data.
ValueError: parameter ``weights`` must be cast into 1d array with dimension equal to flattened spatial dimension of data.
    raise ValueError(
ValueError: parameter ``weights`` must be cast into 1d array with dimension equal to flattened spatial dimension of data.
    raise ValueError(
ValueError: parameter ``weights`` must be cast into 1d array with dimension equal to flattened spatial dimension of data
dalcinl commented 5 months ago

What's the shape of your weights? Can you try passing weithgs.reshape(-1) instead?

FrankFrank9 commented 5 months ago

Thank you now I'm able to run with 2-3 ranks but I get the same error when I scale this up. For reference my weights shape is:

(1394730,)

and my time series data shape is

(200, 278946, 5)

The error arise from this piece of code

def distribute_dimension(data, max_axis, comm):
    """
    Distribute desired spatial dimension, splitting partitions
    by value // comm.size, with remainder = value % comm.size
    """
    ## distribute largest spatial dimension based on data
    if comm is not None:
        size = comm.size
        rank = comm.rank
        shape = data.shape
        index = [np.s_[:]] * len(shape)
        N = shape[max_axis]
        n, s = _blockdist(N, size, rank)
        index[max_axis] = np.s_[s:s+n]
        index = tuple(index)
        data = data[index]
        comm.Barrier()
    else:
        data = data
    return data

Best

dalcinl commented 5 months ago

and my time series data shape is

(200, 278946, 5)

So do you have 200 time samples, each comprising of 278946 spatial points with 5 variables per point?

I think the weights correspond to just to spatial points and not variables, therefore you should provide 278946 weights, and not 278946 * 5 = 1394730. @mrogowski Can you confirm?

mrogowski commented 5 months ago

We should support weight per spatial point per variable. Looking quickly at the code, I think we may have a bug. We tested the one variable branch heavily in parallel, but not so much for data with multiple variables. @FrankFrank9, what is the format of your data? Could you come up with a simple reproducer?

FrankFrank9 commented 5 months ago

Unfortunately I can't make an easy reproducible thing. I guess anything with those shapes should work. It is an error in redistributing data. Let me know

mrogowski commented 5 months ago

Can you try to run with this change in PySPOD?

FrankFrank9 commented 5 months ago

I get the same error:

ValueError: cannot reshape array of size 139473 into shape (139475,1)

During handling of the above exception, another exception occurred:
dalcinl commented 5 months ago

Unfortunately I can't make an easy reproducible thing.

Not even using random data with shapes that match your data?

mrogowski commented 5 months ago

I generated random data:

data matrix X (200, 278946, 5)
weights (278946, 1, 5)

and tried with 7, 8, 9, 10, 11, 12 processes. All seem to have worked. Any reproducer would be very helpful to assist you.

FrankFrank9 commented 5 months ago

I generated random data:

data matrix X (200, 278946, 5)
weights (278946, 1, 5)

and tried with 7, 8, 9, 10, 11, 12 processes. All seem to have worked. Any reproducer would be very helpful to assist you.

Now it works, the weights need the second axis as well , mine were just (npts, nvars). Thanks for looking into this !

dalcinl commented 5 months ago

Oh, but then that means we can do better, that is, add the missing axis, right Marcin?

mrogowski commented 5 months ago

Now it works, the weights need the second axis as well , mine were just (npts, nvars). Thanks for looking into this !

Good to hear! Like I said before, most of the runs we did so far were for 1 variable 2D data, so you may spot some issues with 1D and/or multivariable data. Let us know and we'll try to fix it.

mrogowski commented 5 months ago

Oh, but then that means we can do better, that is, add the missing axis, right Marcin?

I'll try to reproduce the issue that @FrankFrank9 ran into and fix it. I used (278946, 1, 5) because that's what I got from utils_weights.geo_trapz_2D. It just happens that it was the problem.

FrankFrank9 commented 5 months ago

Now it works, the weights need the second axis as well , mine were just (npts, nvars). Thanks for looking into this !

Good to hear! Like I said before, most of the runs we did so far were for 1 variable 2D data, so you may spot some issues with 1D and/or multivariable data. Let us know and we'll try to fix it.

Thanks a lot! If I find any other issue I'll post here

Best

mrogowski commented 5 months ago

I'll try to reproduce the issue that @FrankFrank9 ran into and fix it.

I couldn't - worked for me with (278946, 5) weights as well.

FrankFrank9 commented 5 months ago

At this point I don't know, the version I was using with the error was coming from

pip install pyspod

Is it the same version?

mrogowski commented 5 months ago

pip install pyspod would install the last published version which does not contain this fix. You'd need to pip install git+https://github.com/MathEXLab/PySPOD@refs/pull/48/head or manually clone the repo from the PR and pip install it.