Open ghost opened 7 years ago
You can use convert.path(src, dst)
to see the steps odo will take:
In [32]: from odo import convert
In [33]: convert.path(sa.Table, pd.DataFrame)
Out[33]:
[(sqlalchemy.sql.schema.Table,
collections.abc.Iterator,
<function odo.backends.sql.sql_to_iterator>),
(collections.abc.Iterator,
odo.chunks.chunks(pandas.DataFrame),
<function odo.convert.iterator_to_DataFrame_chunks>),
(odo.chunks.chunks(pandas.DataFrame),
pandas.core.frame.DataFrame,
<function odo.convert.chunks_dataframe_to_dataframe>)]
Hopefully this helps!
@llllllllll Thanks for responding; convert.path
looks like a helpful tool!
In this particular case, I'm trying to load my dataframe into the remote sql database, so I would guess it's convert.path(pd.DataFrame, sqlalchemy.Table)
. But that gives
>>> import pandas as pd
>>> import sqlalchemy as sa
>>> from odo import convert
>>> convert.path(pd.DataFrame, sqlalchemy.Table)
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/home/user/Documents/project/venv/lib/python3.5/site-packages/networkx/algorithms/shortest_paths/weighted.py in dijkstra_path(G, source, target, weight)
79 try:
---> 80 return path[target]
81 except KeyError:
KeyError: <class 'sqlalchemy.sql.schema.Table'>
During handling of the above exception, another exception occurred:
NetworkXNoPath Traceback (most recent call last)
<ipython-input-31-3da53a9187b8> in <module>()
----> 1 convert.path(pd.DataFrame, sa.Table)
/home/user/Documents/project/venv/lib/python3.5/site-packages/odo/core.py in path(self, *args, **kwargs)
39
40 def path(self, *args, **kwargs):
---> 41 return path(self.graph, *args, **kwargs)
42
43 def __call__(self, *args, **kwargs):
/home/user/Documents/project/venv/lib/python3.5/site-packages/odo/core.py in path(graph, source, target, excluded_edges, ooc_types)
90 if issubclass(n, oocs)])
91 with without_edges(graph, excluded_edges) as g:
---> 92 pth = nx.shortest_path(g, source=source, target=target, weight='cost')
93 result = [(src, tgt, graph.edge[src][tgt]['func'])
94 for src, tgt in zip(pth, pth[1:])]
/home/user/Documents/project/venv/lib/python3.5/site-packages/networkx/algorithms/shortest_paths/generic.py in shortest_path(G, source, target, weight)
136 paths=nx.bidirectional_shortest_path(G,source,target)
137 else:
--> 138 paths=nx.dijkstra_path(G,source,target,weight)
139
140 return paths
/home/user/Documents/project/venv/lib/python3.5/site-packages/networkx/algorithms/shortest_paths/weighted.py in dijkstra_path(G, source, target, weight)
81 except KeyError:
82 raise nx.NetworkXNoPath(
---> 83 "node %s not reachable from %s" % (source, target))
84
85
NetworkXNoPath: node <class 'pandas.core.frame.DataFrame'> not reachable from <class 'sqlalchemy.sql.schema.Table'>
Loading a dataframe into a table uses append
, which is not itself a network dispatcher, instead it is a regular multiply dispatched function which converts the input into an iterator and then appends the iterator to the the table.
There is no "dry-run" for append but I agree that this would be a useful feature.
If I understand, the capability to send a local dataframe or csv to remote sql database quickly is forthcoming but not yet as fast as possible. I get the impression that sending a dataframe sends one row at a time and sending a csv isn't possible. Is that right? I'd love to use odo
for this if I can. Thanks for your help!
I have a
pandas.DataFrame
and I want to send it to a remote sql database. I'm not sure if it's going to do something fast using\copy
orINSERT ... VALUES
or instead something slow usingpandas.DataFrame.to_sql
or sqlalchemy'sexecutemany
.Is there a way I can find out what it's doing? If it's doing something slow, is there a way to hint at something faster?