Closed PrometheusPi closed 6 years ago
cc @steindev and @alex-koe
@PrometheusPi the attributes with _
prefix (such as _size
) are not openPMD and are only in libSplash files for legacy reasons. please use the according openPMD attributes instead.
For example, _size
does not exist in openPMD - just use .shape
of the data set in python (it is an int!).
Decisions for unsigned vs. signed are usually not done due to memory constrains but due to definition range. A size can never be negative for example.
Indeed, uint-int arithmetics is often weird in python, since its automatically trying to cast up to a "more precise" type on mixed type math. In any case, you can always cast your access to the array indices:
data_slice = f[...][:, int(Ny)//2, :]
Python 2.7 and 3.4 possibilities for the numpy cast handling:
import numpy as np
np.uint(3)//2
# 1.0
# correct int division, just upcasted to float (which numpy indexes do not like)
np.uint(3)/2
# 1.5
# proper float division (which numpy indexes do not like)
np.floor_divide(np.uint(3), 2, dtype=np.int)
# 1
# proper numpy mixed int math
N = f[...].attrs['_size'].astype(np.int64)
Ny = N[1]
data_slice = f[...][:, Ny//2, :]
# convert on read
I usually prefer int(Ny)//2
or if necessary the last method. This is still "just" a Numpy specific thing and not really a question of the actually stored data attribute.
Anyway, for your specific question: use .shape
of the numpy ndarray you read:
Ny = f[...].shape[1]
data_slice = f[...][:, Ny//2, :]
Further investigations showed that only the np.uint64
data type is effected. I opened an issue at the numpy repo.
This seems to be a python issue - I will thus close this issue here.
Update:
In numpy, casting to float is intentional at this point. Since uint has a slightly higher range, a cast to int might lead to errors. Thus unitX // intX
will always return int(X*2)
. However, since unit64
is the largest rang available, there a cast to float is favored to avoid possible errors.
Thus, numpy will not change this behavior.
The optimal solution is thus:
Ny // np.uint(2)
Thx for asking upstream!
Actually the decision of going to float for a slightly larger range (2x) is bought by getting less precision due to the mantissa in floats. This can only be counter-acted again by going to really large floats, which is memory and speed costly when starting from a (u)int64
...
Various attributes are given as unit64 type. This might be slightly more memory efficient but causes a lot of trouble when using the common openPMD data analysis tools, because e.g. when loading
_size
to perform a data slicing via integer division aswill fail because a integer division by a
uint
and anint
is just a floating point division in python.I would vote for changing theses types to ints. Any other suggestions how to solve this?