askap-vast / vast-pipeline

This repository holds the code of the Radio Transient detection pipeline for the VAST project.
https://vast-survey.org/vast-pipeline/
MIT License
8 stars 3 forks source link

Measurements arrow file timezone in datetime column causes vaex error #570

Closed ajstewart closed 3 years ago

ajstewart commented 3 years ago

The timezone aware date format in the measurements arrow file causes vaex to error out when trying to interact with the column or even show the dataframe info.

I've attempted to clarify this with vaex but as vaex is at the moment it doesn't like the timezone info being attached.

>>> my_run.measurements.info()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-14-f9adcaec576a> in <module>
----> 1 my_run.measurements.info()

~/anaconda3/envs/vast-tools-dev/lib/python3.8/site-packages/vaex/dataframe.py in info(self, description)
   3377         from IPython import display
   3378         self._output_css()
-> 3379         display.display(display.HTML(self._info(description=description)))
   3380 
   3381     def _info(self, description=True):

~/anaconda3/envs/vast-tools-dev/lib/python3.8/site-packages/vaex/dataframe.py in _info(self, description)
   3397             virtual = name in self.virtual_columns
   3398             if not virtual:
-> 3399                 dtype = str(self.data_type(name)) if self.data_type(name) != str else 'str'
   3400             else:
   3401                 dtype = "</i>virtual column</i>"

~/anaconda3/envs/vast-tools-dev/lib/python3.8/site-packages/vaex/datatype.py in __repr__(self)
     62             if internal.byteorder == ">":
     63                 internal = internal.newbyteorder()
---> 64         if self.is_datetime:
     65             internal = self.numpy
     66 

~/anaconda3/envs/vast-tools-dev/lib/python3.8/site-packages/vaex/datatype.py in is_datetime(self)
    193         if self.is_string:
    194             return False
--> 195         return vaex.array_types.to_numpy_type(self.internal).kind in 'M'
    196 
    197     @property

~/anaconda3/envs/vast-tools-dev/lib/python3.8/site-packages/vaex/array_types.py in to_numpy_type(data_type, strict)
    249         return data_type
    250     else:
--> 251         return numpy_dtype_from_arrow_type(data_type, strict=strict)
    252 
    253 

~/anaconda3/envs/vast-tools-dev/lib/python3.8/site-packages/vaex/array_types.py in numpy_dtype_from_arrow_type(arrow_type, strict)
    259 def numpy_dtype_from_arrow_type(arrow_type, strict=True):
    260     data = pa.array([], type=arrow_type)
--> 261     return numpy_dtype(data, strict=strict)
    262 
    263 

~/anaconda3/envs/vast-tools-dev/lib/python3.8/site-packages/vaex/array_types.py in numpy_dtype(x, strict)
    196         # dtype = DataType(arrow_type)
    197         dtype = arrow_type.to_pandas_dtype()
--> 198         dtype = np.dtype(dtype)  # turn into instance
    199         if strict:
    200             return dtype

TypeError: Cannot interpret 'datetime64[ns, UTC]' as a data type