GlobalArrays / ga4py

Global Arrays Python bindings and distributed NumPy module GAiN
8 stars 0 forks source link

Limit on number of global arrays that can be created #3

Open cjroberts opened 5 years ago

cjroberts commented 5 years ago

Hello, we would like to use ga4py for parallelising the cf-python library for analysis of gescientific data (https://cfpython.bitbucket.io/). Empirically, we found that there is a hard limit on the number of global arrays that can be created of 32768. For some of our use cases we would like to create in the order of millions of arrays simultaneously. Please could you tell us why this limit exists and if it would be possible to overcome it?

bjpalmer commented 5 years ago

Charles,

GA maintains an internal list of statically allocated array descriptors that are used to store meta-data on global arrays. This array is replicated on every MPI process so 32768 was chosen as a compromise value. We thought this would be big enough to cover most use cases but not so large that it would take up a significant fraction of available memory. If you want to increase the number of allowable GAs, you can do this fairly easily by setting the MAX_ARRAYS value in global/src/gaconfig.h to whatever value you like.

cjroberts commented 5 years ago

Thank you for your quick reply.

Is there a way to estimate how much memory the internal list will require given the MAX_ARRAYS value?

bjpalmer commented 5 years ago

For global arrays configure with --enable-i8 the size of the array descriptor is around 820 bytes (it should be smaller than this for --enable-i4). The total memory for the internal list will therefore be around 820MAX_ARRAYS bytes. You can get a more accurate value by adding a statement that prints out the value of MAX_ARRAYSsizeof(global_array_t) in the pnga_initialize function in global/src/base.c if you need a precise number.

bjpalmer commented 5 years ago

Sorry, the previous statement is a bit garbled (apparently an asterisk is interpreted as a formatting character). The estimate is 820 times MAX_ARRAYS bytes.

cjroberts commented 5 years ago

Thank you, that makes sense. We might be able to improve our memory management strategy as well, but would like to support users in increasing the maximum number of arrays as this might be acceptable on some high memory nodes. Could I also ask if it is possible to access the value of MAX_ARRAYS from the Python layer, so that if a user has changed it this can be detected automatically and we can keep track of when the number of arrays created gets too large?

bjpalmer commented 5 years ago

At the moment there is no way to do this, although I suppose it could be added.

cjroberts commented 5 years ago

Please would it be possible to add this to your list of feature requests?

bjpalmer commented 5 years ago

So you are looking for a function that is roughly

int GA_Max_allowable_global_arrays()?

cjroberts commented 5 years ago

Yes, that's right. That we can access from ga4py.

cjroberts commented 5 years ago

Hello, we just wanted to let you know that we are pursuing another approach at the moment of using just one massive, 1D, 8 bit integer global array to store large amounts of smaller arrays of different shapes and data types. So, we might not need this function after all, although we are not sure yet which approach we will ultimately take. Thank you for your time and the information on how global arrays works.