Closed jiffyclub closed 9 years ago
Hrm, this works for me
In [1]: from into import into, discover, pd
In [2]: df = pd.DataFrame([['Alice', 100],
['Bob', 200],
['Charlie', 300]], columns=['name', 'balance'])
In [3]: into('hdfstore://foo.h5::accounts', df)
Out[3]: frame_table (typ->appendable,nrows->3,ncols->2,indexers->[index])
In [4]: into('sqlite:///foo.db::accounts', 'hdfstore://foo.h5::accounts')
Out[4]: Table('accounts', MetaData(bind=Engine(sqlite:///foo.db)), Column('name', Text(), table=<accounts>), Column('balance', BigInteger(), table=<accounts>, nullable=False), schema=None)
Possibly it's running in to an error within the network, trying to reroute, and then running out of available paths.
Can you try
blz.discover(blz.resource('hdfstore://bayarea.h5::/buildings'))
blz.into('sqlite:///bayarea.db::buildings', 'hdfstore://bayarea.h5::/buildings', raise_on_errors=True)
blz.discover(blz.resource('hdfstore://bayarea.h5::/buildings'))
dshape("""1936259 * {
parcel_id: int32,
building_type: ?string,
residential_units: ?float32,
non_residential_sqft: ?float32,
building_sqft: ?float32,
stories: ?float32,
building_type_id: ?float32,
year_built: ?float32,
tenure: int32,
id: int32,
building: int32,
scenario: int32,
county: ?string,
lot_size: ?float32,
rent: ?float64,
rental: bool,
general_type: ?string,
unit_sqft: ?float32,
unit_lot_size: ?float32,
x: ?float32,
y: ?float32,
_node_id0: int32,
_node_id1: int32,
_node_id2: int32,
_node_id: int32
}""")
blz.into(
'sqlite:///bayarea.db::buildings',
'hdfstore://bayarea.h5::/buildings',
raise_on_errors=True)
Gives the same error as without raise_on_errors
:
---------------------------------------------------------------------------
NetworkXNoPath Traceback (most recent call last)
<ipython-input-11-3ca44359a7e0> in <module>()
2 'sqlite:///bayarea.db::buildings',
3 'hdfstore://bayarea.h5::/buildings',
----> 4 raise_on_errors=True)
/Users/jiffyclub/miniconda3/envs/blaze/lib/python2.7/site-packages/multipledispatch/dispatcher.pyc in __call__(self, *args, **kwargs)
161 self._cache[types] = func
162 try:
--> 163 return func(*args, **kwargs)
164
165 except MDNotImplementedError:
/Users/jiffyclub/miniconda3/envs/blaze/lib/python2.7/site-packages/into/into.pyc in into_string_string(a, b, **kwargs)
78 def into_string_string(a, b, **kwargs):
79 r = resource(b, **kwargs)
---> 80 return into(a, r, **kwargs)
81
82
/Users/jiffyclub/miniconda3/envs/blaze/lib/python2.7/site-packages/multipledispatch/dispatcher.pyc in __call__(self, *args, **kwargs)
161 self._cache[types] = func
162 try:
--> 163 return func(*args, **kwargs)
164
165 except MDNotImplementedError:
/Users/jiffyclub/miniconda3/envs/blaze/lib/python2.7/site-packages/into/into.pyc in into_string(uri, b, **kwargs)
72
73 a = resource(uri, dshape=resource_ds, expected_dshape=ds, **kwargs)
---> 74 return into(a, b, dshape=ds, **kwargs)
75
76
/Users/jiffyclub/miniconda3/envs/blaze/lib/python2.7/site-packages/multipledispatch/dispatcher.pyc in __call__(self, *args, **kwargs)
161 self._cache[types] = func
162 try:
--> 163 return func(*args, **kwargs)
164
165 except MDNotImplementedError:
/Users/jiffyclub/miniconda3/envs/blaze/lib/python2.7/site-packages/into/into.pyc in into_object(a, b, **kwargs)
58 if 'dshape' not in kwargs:
59 kwargs['dshape'] = discover(b)
---> 60 return append(a, b, **kwargs)
61
62
/Users/jiffyclub/miniconda3/envs/blaze/lib/python2.7/site-packages/multipledispatch/dispatcher.pyc in __call__(self, *args, **kwargs)
161 self._cache[types] = func
162 try:
--> 163 return func(*args, **kwargs)
164
165 except MDNotImplementedError:
/Users/jiffyclub/miniconda3/envs/blaze/lib/python2.7/site-packages/into/backends/sql.pyc in append_anything_to_sql_Table(t, o, **kwargs)
301 @append.register(sa.Table, object)
302 def append_anything_to_sql_Table(t, o, **kwargs):
--> 303 return append(t, convert(Iterator, o, **kwargs), **kwargs)
304
305
/Users/jiffyclub/miniconda3/envs/blaze/lib/python2.7/site-packages/into/core.pyc in __call__(self, *args, **kwargs)
27
28 def __call__(self, *args, **kwargs):
---> 29 return _transform(self.graph, *args, **kwargs)
30
31
/Users/jiffyclub/miniconda3/envs/blaze/lib/python2.7/site-packages/into/core.pyc in _transform(graph, target, source, excluded_edges, ooc_types, **kwargs)
40 pth = path(graph, type(source), target,
41 excluded_edges=excluded_edges,
---> 42 ooc_types=ooc_types)
43 try:
44 for (A, B, f) in pth:
/Users/jiffyclub/miniconda3/envs/blaze/lib/python2.7/site-packages/into/core.pyc in path(graph, source, target, excluded_edges, ooc_types)
77 graph = graph.subgraph([n for n in graph.nodes() if issubclass(n, oocs)])
78 with without_edges(graph, excluded_edges) as g:
---> 79 pth = nx.shortest_path(g, source=source, target=target, weight='cost')
80 result = [(source, target, graph.edge[source][target]['func'])
81 for source, target in zip(pth, pth[1:])]
/Users/jiffyclub/miniconda3/envs/blaze/lib/python2.7/site-packages/networkx/algorithms/shortest_paths/generic.pyc in shortest_path(G, source, target, weight)
136 paths=nx.bidirectional_shortest_path(G,source,target)
137 else:
--> 138 paths=nx.dijkstra_path(G,source,target,weight)
139
140 return paths
/Users/jiffyclub/miniconda3/envs/blaze/lib/python2.7/site-packages/networkx/algorithms/shortest_paths/weighted.pyc in dijkstra_path(G, source, target, weight)
80 except KeyError:
81 raise nx.NetworkXNoPath(
---> 82 "node %s not reachable from %s" % (source, target))
83
84
NetworkXNoPath: node <class 'pandas.io.pytables.FrameFixed'> not reachable from <class '_abcoll.Iterator'>
One thing to note I guess is that the HDFStore I'm working with was not made by into, but using pandas.HDFStore.
All HDFStore
s are made with pandas. It's a pandas-only format. into
just calls pandas to deal with it.
So far nothing seems amiss. Your datatypes are actually relatively tame. Any chance I could actually have a sample of the data? (totally fine if this isn't possible). Alternatively can you reproduce the error on a publicly accessible dataset?
BTW, if you want to be friendlier with other ecosystems you might want to try Panda's format=
keyword.
df.to_hdf(path, key, format='table')
This is both appendable and a more widely recognized format.
I can reproduce the error with something as simple as:
import pandas as pd
from into import into
df = pd.DataFrame({'a': range(5)})
with pd.HDFStore('test.h5') as store:
store['test'] = df
into('sqlite:///:memory:::test', 'hdfstore://test.h5::/test')
Great. thanks!
On Tue, Feb 3, 2015 at 11:14 AM, Matt Davis notifications@github.com wrote:
I can reproduce the error with something as simple as:
import pandas as pdfrom into import into
df = pd.DataFrame({'a': range(5)})with pd.HDFStore('test.h5') as store: store['test'] = df
into('sqlite:///:memory:::test', 'hdfstore://test.h5::/test')
— Reply to this email directly or view it on GitHub https://github.com/ContinuumIO/into/issues/63#issuecomment-72714604.
OK, fixed in https://github.com/ContinuumIO/into/commit/69906d7e048c56611d97f1ae036279d20406d5fb
The issue was that the FrameFixed
HDFStore
format (what you get by default use of to_hdf
) was marked as "possibly larger than memory" but only had one route out of it, namely through pd.DataFrame
. When migrating between two possibly-larger-than-memory formats into
restricts itself to the subgraph of only possibly-larger-than-memory nodes. In this subgraph, FrameFixed
is isolated.
I've removed the marking that FrameFixed
is possibly out-of-memory. This is de-facto true as it can only be created by an in-memory format and doesn't support partitioned/blocked access.
In [1]: import pandas as pd
In [2]: from into import into
In [3]:
In [3]: df = pd.DataFrame({'a': range(5)})
In [4]: with pd.HDFStore('test.h5') as store:
...: store['test'] = df
...:
In [5]: into('sqlite:///:memory:::test', 'hdfstore://test.h5::/test')
Out[5]: Table('test', MetaData(bind=Engine(sqlite:///:memory:)), Column('a', BigInteger(), table=<test>, nullable=False), schema=None)
In [6]: into(list, _)
Out[6]: [(0,), (1,), (2,), (3,), (4,)]
Confirmed, thanks!
There doesn't seem to be a way to get from an HDFStore to SQL, though you can go from an HDFStore to a Dataframe and a DataFrame to SQL.
This fails:
with this error:
But this works fine:
And this works fine: