blaze / datashape

Language defining a data description protocol
BSD 2-Clause "Simplified" License
183 stars 65 forks source link

TypeError: DataShape time is not NumPy-compatible #219

Open drabastomek opened 8 years ago

drabastomek commented 8 years ago

Hi,

I have some problems when interfacing with MongoDB. Namely, I can do this

In [104]: odo(traffic, 'mongodb://localhost:27017/packt::traffic')

Out[104]: Collection(Database(MongoClient('localhost', 27017), 'packt'), 'traffic')

but when I try to read it back

In [105]: traffic_mongo = bl.Data('mongodb://localhost:27017/packt::traffic')

traffic_mongo.head()

I get the error as in the subject. The traffic object is of blaze.Data() type and contains two date fields.

kwmsmith commented 8 years ago

Hello,

What is the result of traffic_mongo.dshape? What is the contents of the original traffic object before it's wrapped in blaze.Data()? I ask so that I can reproduce.

drabastomek commented 8 years ago

Hey,

Thanks for getting back to me.

I downloaded this dataset ->https://catalog.data.gov/dataset/traffic-violations-56dda.

This is what I get back from .dshape

dshape("""404536 * { Accident: string, Agency: string, Alcohol: string, 'Arrest Type': string, Article: string, Belts: string, Charge: string, Color: string, 'Commercial License': string, 'Commercial Vehicle': string, 'Contributed To Accident': string, 'DL State': string, 'Date Of Stop': datetime, Description: string, 'Driver City': string, 'Driver State': string, Fatal: string, Gender: string, Geolocation: string, HAZMAT: string, Latitude: float64, Location: string, Longitude: float64, Make: string, Model: string, 'Personal Injury': string, 'Property Damage': string, Race: string, State: string, SubAgency: string, 'Time Of Stop': datetime, VehicleType: string, 'Violation Type': string, 'Work Zone': string, Year: float64 }""")

And now it worked when I ran traffic_mongo.head()... :-o I'm baffled as yesterday I executed exactly the same script and constantly was getting that error... Weird. I think we can now close this one... Apologies.

gnuhub commented 7 years ago
print(blaze.__version__)
0.11.3+6.g31060532

"day_time":"19:29:14.000"

day_time: time,

DataShape time is not NumPy-compatible
gnuhub commented 7 years ago
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/Users/stallman/anaconda/lib/python3.6/site-packages/IPython/core/formatters.py in __call__(self, obj)
    309             method = get_real_method(obj, self.print_method)
    310             if method is not None:
--> 311                 return method()
    312             return None
    313         else:

/Users/stallman/anaconda/lib/python3.6/site-packages/blaze-0.11.3+6.g31060532-py3.6.egg/blaze/interactive.py in _warning_repr_html(self)
    450     else:
    451         warnings.warn(_warning_msg, DeprecationWarning, stacklevel=2)
--> 452         return to_html(self)
    453 
    454 

/Users/stallman/anaconda/lib/python3.6/site-packages/multipledispatch/dispatcher.py in __call__(self, *args, **kwargs)
    162             self._cache[types] = func
    163         try:
--> 164             return func(*args, **kwargs)
    165 
    166         except MDNotImplementedError:

/Users/stallman/anaconda/lib/python3.6/site-packages/blaze-0.11.3+6.g31060532-py3.6.egg/blaze/interactive.py in to_html(expr)
    387     if not expr._resources() or ndim(expr) != 1:
    388         return to_html(expr_repr(expr))
--> 389     return to_html(concrete_head(expr))
    390 
    391 

/Users/stallman/anaconda/lib/python3.6/site-packages/blaze-0.11.3+6.g31060532-py3.6.egg/blaze/interactive.py in concrete_head(expr, n)
    228         return odo(head, object)
    229     elif isrecord(expr.dshape.measure):
--> 230         return odo(head, DataFrame)
    231 
    232     df = odo(head, DataFrame)

/Users/stallman/anaconda/lib/python3.6/site-packages/odo/odo.py in odo(source, target, **kwargs)
     89     odo.append.append      - Add things onto existing things
     90     """
---> 91     return into(target, source, **kwargs)

/Users/stallman/anaconda/lib/python3.6/site-packages/multipledispatch/dispatcher.py in __call__(self, *args, **kwargs)
    162             self._cache[types] = func
    163         try:
--> 164             return func(*args, **kwargs)
    165 
    166         except MDNotImplementedError:

/Users/stallman/anaconda/lib/python3.6/site-packages/blaze-0.11.3+6.g31060532-py3.6.egg/blaze/interactive.py in into(a, b, **kwargs)
    404     result = compute(b, return_type='native', **kwargs)
    405     kwargs['dshape'] = b.dshape
--> 406     return into(a, result, **kwargs)
    407 
    408 

/Users/stallman/anaconda/lib/python3.6/site-packages/multipledispatch/dispatcher.py in __call__(self, *args, **kwargs)
    162             self._cache[types] = func
    163         try:
--> 164             return func(*args, **kwargs)
    165 
    166         except MDNotImplementedError:

/Users/stallman/anaconda/lib/python3.6/site-packages/odo/into.py in wrapped(*args, **kwargs)
     41             raise TypeError('dshape argument is not an instance of DataShape')
     42         kwargs['dshape'] = dshape
---> 43         return f(*args, **kwargs)
     44     return wrapped
     45 

/Users/stallman/anaconda/lib/python3.6/site-packages/odo/into.py in into_type(a, b, dshape, **kwargs)
     51         if dshape is None:
     52             dshape = discover(b)
---> 53     return convert(a, b, dshape=dshape, **kwargs)
     54 
     55 

/Users/stallman/anaconda/lib/python3.6/site-packages/odo/core.py in __call__(self, *args, **kwargs)
     42 
     43     def __call__(self, *args, **kwargs):
---> 44         return _transform(self.graph, *args, **kwargs)
     45 
     46 

/Users/stallman/anaconda/lib/python3.6/site-packages/odo/core.py in _transform(graph, target, source, excluded_edges, ooc_types, **kwargs)
     58     try:
     59         for (A, B, f) in pth:
---> 60             x = f(x, excluded_edges=excluded_edges, **kwargs)
     61         return x
     62     except NotImplementedError as e:

/Users/stallman/anaconda/lib/python3.6/site-packages/odo/convert.py in list_to_numpy(seq, dshape, **kwargs)
    168             not isscalar(dshape)):
    169         seq = list(map(tuple, seq))
--> 170     return np.array(seq, dtype=dshape_to_numpy(dshape))
    171 
    172 

/Users/stallman/anaconda/lib/python3.6/site-packages/odo/numpy_dtype.py in dshape_to_numpy(ds)
     83         return np.dtype([
     84             (str(name), unit_to_dtype(typ))
---> 85             for name, typ in zip(ds.names, ds.types)
     86         ])
     87     if isinstance(ds, Tuple):

/Users/stallman/anaconda/lib/python3.6/site-packages/odo/numpy_dtype.py in <listcomp>(.0)
     83         return np.dtype([
     84             (str(name), unit_to_dtype(typ))
---> 85             for name, typ in zip(ds.names, ds.types)
     86         ])
     87     if isinstance(ds, Tuple):

/Users/stallman/anaconda/lib/python3.6/site-packages/odo/numpy_dtype.py in unit_to_dtype(ds)
     48     if ds == string:
     49         return np.dtype('O')
---> 50     return to_numpy_dtype(ds)
     51 
     52 

/Users/stallman/anaconda/lib/python3.6/site-packages/datashape/coretypes.py in to_numpy_dtype(ds)
   1277     """ Throw away the shape information and just return the
   1278     measure as NumPy dtype instance."""
-> 1279     return to_numpy(ds.measure)[1]
   1280 
   1281 

/Users/stallman/anaconda/lib/python3.6/site-packages/datashape/coretypes.py in to_numpy(ds)
   1310         msr = ds
   1311 
-> 1312     return tuple(shape), msr.to_numpy_dtype()
   1313 
   1314 

/Users/stallman/anaconda/lib/python3.6/site-packages/datashape/coretypes.py in to_numpy_dtype(self)
    171 
    172     def to_numpy_dtype(self):
--> 173         raise TypeError('DataShape %s is not NumPy-compatible' % self)
    174 
    175 

TypeError: DataShape time is not NumPy-compatible

Out[7]:
<'Collection' data; _name='_1', dshape='14041053 * {  _id: int64,  amount: int64,  avg_per...'>
gnuhub commented 7 years ago

https://github.com/blaze/datashape/blob/master/datashape/coretypes.py#L224

majidaldo commented 6 years ago

numpy doesn't have a good type to describe time (only). that's why they didn't implement that. try to take it out as a string. but this is really a problem odo has to deal with.

TianFengshou commented 6 years ago

Now the problem seems to remain unsolved. What can I do to read time-type data by string type