Open femtotrader opened 8 years ago
You can extend odo's conversion graph by dispatching on convert, for example:
from odo import convert
@convert.register(pd.DataFrame, datapackage.DataPackage)
def datapackage_to_dataframe(pkg):
# function that takes a datapackage and returns a dataframe
...
This will then allow you to make this conversion by using odo
To make datapackage seem more "native", you might want to also create dispatchers for append
and discover
.
@femtotrader Glad you asked! There's some nice documentation on how to do this here. Let us know if you have any questions.
In [33]: datapkg.data
Out[33]: <itertools.chain at 0x104a2a898>
datapkg.data
is an itertools.chain.
what is according you the best way to integrate with odo.
Do you really think that doing something like
pd.DataFrame(list(pkg.data))
is a good idea ?
you could just return the data
attribute, like this:
from collections import Iterator
@convert.register(Iterator, DataPackage)
def datapackage_to_iterator(datapkg, **kwargs):
return datapkg.data
I try this:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import datapackage
# Note trailing slash is important for data.okfn.org
datapkg = datapackage.DataPackage('http://data.okfn.org/data/cpi/')
assert datapkg.title == "Annual Consumer Price Index (CPI)"
assert datapkg.description == "Annual Consumer Price Index (CPI) for most countries in the world. Reference year is 2005."
cpi_sum = sum([row['CPI'] for row in datapkg.data])
assert cpi_sum == 405442.60078415077
from collections import Iterator
from odo import convert
@convert.register(Iterator, datapackage.DataPackage)
def datapackage_to_iterator(datapkg, **kwargs):
return datapkg.data
from odo import odo
for row in odo(datapkg, Iterator):
print(row)
import pandas as pd
df = odo(datapkg, pd.DataFrame)
print(df)
I thought that I only need to register an iterator as source but it doesn't seems to be enough to be able to build a DataFrame (or a CSV file or a JSON file)
odo(datapkg, pd.DataFrame)
raises
TypeError: list indices must be integers, not str
Any idea ?
I wonder if that's really necessary to do:
import pandas as pd
@convert.register(pd.DataFrame, datapackage.DataPackage)
def datapackage_to_dataframe(datapkg, **kwargs):
return pd.DataFrame(list(datapkg.data))
df = odo(datapkg, pd.DataFrame)
print(df)
Can you provide the full stack of that type error?
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "example.py", line 40, in <module>
df = odo(datapkg, pd.DataFrame)
File "//anaconda/lib/python3.4/site-packages/odo/odo.py", line 90, in odo
return into(target, source, **kwargs)
File "//anaconda/lib/python3.4/site-packages/multipledispatch/dispatcher.py", line 164, in __call__
return func(*args, **kwargs)
File "//anaconda/lib/python3.4/site-packages/odo/into.py", line 25, in into_type
return convert(a, b, dshape=dshape, **kwargs)
File "//anaconda/lib/python3.4/site-packages/odo/core.py", line 30, in __call__
return _transform(self.graph, *args, **kwargs)
File "//anaconda/lib/python3.4/site-packages/odo/core.py", line 46, in _transform
x = f(x, excluded_edges=excluded_edges, **kwargs)
File "//anaconda/lib/python3.4/site-packages/odo/convert.py", line 215, in iterator_to_DataFrame_chunks
df = convert(pd.DataFrame, first, **kwargs)
File "//anaconda/lib/python3.4/site-packages/odo/core.py", line 30, in __call__
return _transform(self.graph, *args, **kwargs)
File "//anaconda/lib/python3.4/site-packages/odo/core.py", line 46, in _transform
x = f(x, excluded_edges=excluded_edges, **kwargs)
File "//anaconda/lib/python3.4/site-packages/odo/convert.py", line 166, in list_to_numpy
seq = list(records_to_tuples(dshape, seq))
File "//anaconda/lib/python3.4/site-packages/odo/utils.py", line 212, in records_to_tuples
return get(ds.measure.names, data)
File "//anaconda/lib/python3.4/site-packages/toolz/itertoolz.py", line 400, in get
return operator.itemgetter(*ind)(seq)
TypeError: list indices must be integers, not str
I also tried
@convert.register(Iterator, datapackage.DataPackage)
def datapackage_to_iterator(datapkg, **kwargs):
return datapkg.get_data(datapkg.resources[0])
datapkg.get_data(datapkg.resources[0])
returns a generator
but it also raises same exception
but I noticed that
In [29]: discover(datapkg.resources[0])
Out[29]:
dshape("""{
datapackage_uri: string,
format: string,
is_local: bool,
mediatype: string,
name: string,
schema: {
fields: 4 * {
description: ?string,
format: ?string,
name: string,
type: string
}
},
url: string
}""")
and
In [30]: discover(datapkg.data)
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
<ipython-input-30-6707a7ba048d> in <module>()
----> 1 discover(datapkg.data)
//anaconda/lib/python3.4/site-packages/multipledispatch/dispatcher.py in __call__(self, *args, **kwargs)
162 self._cache[types] = func
163 try:
--> 164 return func(*args, **kwargs)
165
166 except MDNotImplementedError:
//anaconda/lib/python3.4/site-packages/datashape/discovery.py in discover(o, **kwargs)
52 return from_numpy(o.shape, o.dtype)
53 raise NotImplementedError("Don't know how to discover type %r" %
---> 54 type(o).__name__)
55
56
NotImplementedError: Don't know how to discover type 'chain'
I'm also looking for some help to convert a JSON Table Schema to DataShape
In [140]: datapkg.resources[0]['schema']['fields']
Out[140]:
[{'name': 'Country Name', 'type': 'string'},
{'name': 'Country Code', 'type': 'string'},
{'format': 'yyyy', 'name': 'Year', 'type': 'date'},
{'description': 'CPI (where 2005=100)', 'name': 'CPI', 'type': 'number'}]
http://dataprotocols.org/data-packages/#schemas-property http://dataprotocols.org/json-table-schema/
@femtotrader can you show list(datapkg.data)[0]
In [42]: list(datapkg.data)[0]
Out[42]:
{'CPI': 89.1695876693231,
'Country Code': 'AFG',
'Country Name': 'Afghanistan',
'Year': datetime.date(2004, 1, 1)}
In [43]: type(list(datapkg.data)[0])
Out[43]: dict
A new project to convert JSON Table Schema <--> Datashape is available here https://github.com/okfn/jts-datashape
Hello,
I'm looking for some help to make an object odo"izable" (able to be a source for odo).
raises
see https://github.com/trickvi/datapackage/issues/45
is it possible to inherit a parent class to provide an odo"izable" object ?
Kind regards