Open brianthomas opened 10 years ago
I think that the maximum number of dimensions in HDF5 is 32, defined by H5S_MAX_RANK (edited this number in the use case).
"640K ought to be enough for anybody." -bill gates
I supply the quote to wonder if building limits into the data format is wise. What appears to be a good limit today may look inadequate in the future.
A fixed number simplifies some coding in C et al and having it as a compile time parameter makes it easy to change. Are there any datasets in astronomy that even come close to 32 though?
Radio guys are probably the ones pushing this more than anyone else (the number of dimensions needed)
For the SKA data products the number of elements per dimension will be very large, but the number of dimensions will be typically 2 spatial axes (in the order of 100Mpixel), 1 polarization (4 values), and a frequency axis (with up to 256k channels). An optional RFI axis could be added, where elements in one plane are the actual measure, and the other would be an RFI axes, and other such maps, but I cannot envision nothing beyond 32 for a data product. A velocity axis is typically a second representation of the frequency axis, as well as a wavelength one.
For raw data I can imagine more dimensions, including baseline, but I find it difficult to go beyond 32 dimensions.
On Thu, Oct 16, 2014 at 5:19 PM, Brian Thomas notifications@github.com wrote:
Radio guys are probably the ones pushing this more than anyone else.
Reply to this email directly or view it on GitHub https://github.com/astrodataformat/usecases/issues/1#issuecomment-59388588 .
Juande Santander-Vela System Engineer (Science Data Processor/Telescope Manager) Square Kilometre Array/SKA Organisation Jodrell Bank Observatory, Lower Withington Macclesfield SK11 9DL, United Kingdom
It looks like the maximum number of dimensions is defined in the header file H5Spublic.h in HDF5. The standard library limits dataspace objects to a maximum rank of 32 but it should be possible to change this up to the maximum value on the system and recompile the library if necessary. I think this is a good approach and I agree that it is unlikely that a larger value is needed.
I believe it would be foolish to bake in any absolute upper limit, though it might make sense to define a minimum number of dimensions that software readers must be able to support somehow. Even for very large numbers of dimensions readers should probably at least be able to return slices along a subset of those dimensions--after all it's still just bytes.
I believe Numpy has a baked in limit of 256 axes for ndarrays, but that can be changed at compile-time if needed. So data with more than 256 dimensions may not be readable into a typical Numpy array and software should be able to detect that.
I guess what I'm trying to say is, I don't feel like all data needs to be readable by all readers (at least in extreme cases) as long as it's clear where the limitations are, and that it's at least possible to find a way to read the data in those files in the preferred format.
Yes, I agree with Erik; this is what I was alluding to earlier. Further, I'd expect that various data models probably have different limits. Not sure what the minimum for all images in the format might be, but its at least 3 axes.
OK, so we should characterize some typical dimensions, spatial, polarization, frequency/wavelength, time, intensity which can be orthogonal, and perhaps hope for the referees to show a few more, and then use the typical dimension limits in modern formats to show that there is a lot of legroom, and that more can be achieved.
@juandesant I think that would be a good starting point for justification of any derived requirement(s)
We need to determine a maximum, if any, in the number of allowed dimensions in a data cube.