Quansight-Labs / numpy.net

A port of NumPy to .Net
BSD 3-Clause "New" or "Revised" License
131 stars 14 forks source link

Array limit #4

Closed QadiymStewart closed 4 years ago

QadiymStewart commented 4 years ago

Is there a cap on array size? System.Exception: '(NpyExc_ValueError) NpyArray_NewFromDescr: array is too big.'

KevinBaselinesw commented 4 years ago

Yes. It is a .NET limitation. I think it is 2GB max. If you try to get allocate an array larger than that .NET will throw an exception.

The original authors put in a check to catch the error before .NET throws it I think.

That is a very big array. What are you trying to do?

From: Qadiym Stewart notifications@github.com Sent: Tuesday, March 31, 2020 10:18 PM To: Quansight-Labs/numpy.net numpy.net@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [Quansight-Labs/numpy.net] Array limit (#4)

Is there a cap on array size? System.Exception: '(NpyExc_ValueError) NpyArray_NewFromDescr: array is too big.'

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Quansight-Labs/numpy.net/issues/4 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ACP4GWXH6BRW645RBDAWFNLRKKP5LANCNFSM4LYJN5QQ . https://github.com/notifications/beacon/ACP4GWRKBXKJUM6EPM66DK3RKKP5LA5CNFSM4LYJN5Q2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4I2CS3DQ.gif

QadiymStewart commented 4 years ago

need to generate images 3000 px x 3000 px and up 12000px

KevinBaselinesw commented 4 years ago

12000 12000 sizeof(double) == 1,152,000,000

That shouldn’t fail on allocation.

Can you verify how big of an array you are asking for?

From: Qadiym Stewart notifications@github.com Sent: Tuesday, March 31, 2020 10:40 PM To: Quansight-Labs/numpy.net numpy.net@noreply.github.com Cc: KevinBaselinesw kmckenna@baselinesw.com; Comment comment@noreply.github.com Subject: Re: [Quansight-Labs/numpy.net] Array limit (#4)

need to generate images 3000 px x 3000 px and up 12000px

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Quansight-Labs/numpy.net/issues/4#issuecomment-606993387 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ACP4GWRMUDNEQMZPNL5F27LRKKSPLANCNFSM4LYJN5QQ . https://github.com/notifications/beacon/ACP4GWU5YYYGTNZXI4ZIWF3RKKSPLA5CNFSM4LYJN5Q2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEQW7X2Y.gif

QadiymStewart commented 4 years ago

Max it's run is 2800 x 2800 not sure on the array size i'll check. Repo below https://twinnaz.visualstudio.com/_git/CSHARPCPPN

KevinBaselinesw commented 4 years ago

I am going to bed. I am very tired. I will take a look tomorrow if you don’t have it solved.

From: Qadiym Stewart notifications@github.com Sent: Tuesday, March 31, 2020 10:54 PM To: Quansight-Labs/numpy.net numpy.net@noreply.github.com Cc: KevinBaselinesw kmckenna@baselinesw.com; Comment comment@noreply.github.com Subject: Re: [Quansight-Labs/numpy.net] Array limit (#4)

Max it's run is 2800 x 2800 not sure on the array size i'll check. Repo below https://twinnaz.visualstudio.com/_git/CSHARPCPPN

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Quansight-Labs/numpy.net/issues/4#issuecomment-606997497 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ACP4GWT3CHH5YTO3G3JYMJ3RKKUELANCNFSM4LYJN5QQ . https://github.com/notifications/beacon/ACP4GWUZ3S54V6IA4D2NAC3RKKUELA5CNFSM4LYJN5Q2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEQXAX6I.gif

QadiymStewart commented 4 years ago

KK thanks i'll see what i can do tonight.

KevinBaselinesw commented 4 years ago

When I setup to test for 12000:

        GenerateImageNumpyDotNet(ImageName, 12000, 12000, batchSize: 1, netSize: 12, hSize: 32, scalingfactor: 5, edgeDesign: false, layers: 4, seed: 1937515841);

I end with this line and the following input values:

Num_Points = 144,000,000 (1200 * 1200)

Batch_Size = 1

HSize = 32;

That asks the system to create an array that is sized 1,1, 32*144,000,000, or 4,608,000,000 And that exceeds the .NET 2GB limit.

Is there someway to reduce the HSize?

        ndarray hid_vec_scaled = np.reshape(hid_vec, new shape(BatchSize, 1, HSize)) * np.ones((Num_Points, 1), dtype: np.Float32) * Scaling;

Theoretically, there is a way to allow larger size arrays, but that appears to max out at 4,294,967,295 which is still too small for your needs.

https://docs.microsoft.com/en-us/dotnet/framework/configure-apps/file-schema/runtime/gcallowverylargeobjects-element

From: Qadiym Stewart notifications@github.com Sent: Tuesday, March 31, 2020 10:57 PM To: Quansight-Labs/numpy.net numpy.net@noreply.github.com Cc: KevinBaselinesw kmckenna@baselinesw.com; Comment comment@noreply.github.com Subject: Re: [Quansight-Labs/numpy.net] Array limit (#4)

KK thanks i'll see what i can do tonight.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Quansight-Labs/numpy.net/issues/4#issuecomment-606998258 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ACP4GWUZQGQGC3RTOSUX65DRKKUN5ANCNFSM4LYJN5QQ . https://github.com/notifications/beacon/ACP4GWQ7QCY5DIFKECSN5BLRKKUN5A5CNFSM4LYJN5Q2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEQXA54Q.gif

QadiymStewart commented 4 years ago

Thank i ended up capping it at 2000px by 2000px. Then up-scaling the image with a different image processing library.

QadiymStewart commented 4 years ago

Anything else i can do to squeeze some performance out with your library? Going live in a week. :)

KevinBaselinesw commented 4 years ago

By performance you mean make it process faster?

Is there a particular operation that you think is causing it to be slow? Maybe there is some optimizations I can make in that code path.

In my attempts to speed it up, I have optimized most common paths to the point we seem to be at least comparable in speed to the original python/C based code. But maybe there is a code path I missed?

Are there any operations that you are doing repeatedly to get the same result? If so, can you do them once and reuse the results?

In my testing with size=2000, it takes about 90 seconds to generate an image. That does seem like a lot. Have you tested this against a real python/numpy solution? How fast is that?

I have a plan/hope to increase the performance by using parallel operations in certain locations. That requires me to figure out an algorithm that will allow me to run code that is really built to be single threaded. I am close but something is not quite right with it. For sure I won’t be able to have it done in a week. I would guess that would probably at best double the speed.

From: Qadiym Stewart notifications@github.com Sent: Wednesday, April 1, 2020 8:26 AM To: Quansight-Labs/numpy.net numpy.net@noreply.github.com Cc: KevinBaselinesw kmckenna@baselinesw.com; Comment comment@noreply.github.com Subject: Re: [Quansight-Labs/numpy.net] Array limit (#4)

Anything else i can do to squeeze some performance out with your library? Going live in a week. :)

— You are receiving this because you commented. Reply to this email directly, https://github.com/Quansight-Labs/numpy.net/issues/4#issuecomment-607218408 view it on GitHub, or https://github.com/notifications/unsubscribe-auth/ACP4GWQGNBZM5D7RWQT27ETRKMXDVANCNFSM4LYJN5QQ unsubscribe. https://github.com/notifications/beacon/ACP4GWU45EANTNEIPKUY3TDRKMXDVA5CNFSM4LYJN5Q2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEQYWV2A.gif

QadiymStewart commented 4 years ago

That's fine, I'll explore the code a bit more see if theirs anything else I can fine tune. Thanks for your help once again.

KevinBaselinesw commented 4 years ago

It seems like these two bits of code could be done in parallel as the variables don’t seem dependent on each other?

    //Build the network

        Art_Net = FullyConnected(h_vec_unwrapped, NetSize) +

            FullyConnected(x_dat_unwrapped, NetSize, false) +

            FullyConnected(y_dat_unwrapped, NetSize, true) +

            FullyConnected(r_dat_unwrapped, NetSize, false);

public List CreateGrid(int width = 32, int height = 32, float scaling = 1.0f)

    {

        Num_Points = width * height;

        double ret_step = 0;

        ndarray x_range = np.linspace(-1 * scaling, scaling, ref ret_step, width);

        ndarray y_range = np.linspace(-1 * scaling, scaling, ref ret_step, height);

        ndarray x_mat = np.matmul(np.ones(new shape(height, 1)), x_range.reshape(1, width));

        ndarray y_mat = np.matmul(y_range.reshape(height, 1), np.ones(new shape(1, width)));

        ndarray r_mat = np.sqrt((x_mat * x_mat) + (y_mat * y_mat));

        x_mat = np.tile(x_mat.flatten(), BatchSize).reshape(BatchSize, Num_Points, 1);

        y_mat = np.tile(y_mat.flatten(), BatchSize).reshape(BatchSize, Num_Points, 1);

        r_mat = np.tile(r_mat.flatten(), BatchSize).reshape(BatchSize, Num_Points, 1);

        return new List<ndarray>

        {

            x_mat,

            y_mat,

            r_mat

        };

    }

From: Qadiym Stewart notifications@github.com Sent: Wednesday, April 1, 2020 8:26 AM To: Quansight-Labs/numpy.net numpy.net@noreply.github.com Cc: KevinBaselinesw kmckenna@baselinesw.com; Comment comment@noreply.github.com Subject: Re: [Quansight-Labs/numpy.net] Array limit (#4)

Anything else i can do to squeeze some performance out with your library? Going live in a week. :)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Quansight-Labs/numpy.net/issues/4#issuecomment-607218408 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ACP4GWQGNBZM5D7RWQT27ETRKMXDVANCNFSM4LYJN5QQ . https://github.com/notifications/beacon/ACP4GWU45EANTNEIPKUY3TDRKMXDVA5CNFSM4LYJN5Q2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEQYWV2A.gif

QadiymStewart commented 4 years ago

Going to benchmark and see where time most time is being used.