Closed sitsang closed 7 years ago
This is not entirely unexpected because pandas conversion goes through numpy array and
>>> x = q('3#0Ni')
>>> np.array(x)
array([-2147483648, -2147483648, -2147483648])
I don't know how pandas deals with missing values, but pyq has some support for numpy.ma:
>>> np.ma.array(x)
masked_array(data = [-- -- --],
mask = [ True True True],
fill_value = 999999)
0N's are also treated specially when K vectors are converted to lists:
>>> list(x)
[None, None, None]
I don't think there is much we can do to improve this. Support for missing values is flaky in numpy and I am not sure pandas improves much in this area. We can probably document this behavior better, but this is true about most of pyq features - documentation can certainly see some improvement.
I see. Is it mandatory to create a panda dataframe through numpy?
Shouldn't it be possible to generate dataframe through the list function rather than the array function?
Is it mandatory to create a panda dataframe through numpy?
Since pandas DataFrame keeps its data in a numpy.ndarray, going through numpy is the most direct way to convert from pyq to pandas. Note that in many cases pyq to numpy conversion can be achieved without any copying. See https://pyq.enlnt.com/slides/#/5.
Shouldn't it be possible to generate dataframe through the list function rather than the array function?
It is possible, but will be much slower:
In [12]: x = q.til(10000)
In [13]: %timeit pandas.DataFrame({'a': np.asarray(x)})
10000 loops, best of 3: 193 µs per loop
In [14]: %timeit pandas.DataFrame({'a': list(x)})
100 loops, best of 3: 2.13 ms per loop
I found that we get different result when constructing a panda Dataframe from dictionary or array:
Null integer should be mapped to the None type. This is fine when we import to DataFrame as an array: