blaze / odo

Data Migration for the Blaze Project
http://odo.readthedocs.org/
BSD 3-Clause "New" or "Revised" License
1k stars 138 forks source link

"Simulation mode" to show conversion paths in odo #265

Open dan-coates opened 9 years ago

dan-coates commented 9 years ago

When writing new backends and debugging them, it would be useful if there was an option I could specify to the odo function that would "simulate" a conversion by printing out (or returning in a list) the function path it would take from source to destination without actually doing the conversion.

This could help discover issues without having to attempt to move data, and also shed light on potentially awkward paths odo may be taking to convert through the networkx graph to the target data type, allowing for more direct shortcuts to be taken. If nothing else, it could throw the curtain back on some of the magic going on behind the scenes and make the process more understandable.

I've wished this existed enough lately that I very well may just implement it, but welcome any thoughts on how it should be structured, how it should deal with append vs. convert (i.e. MultipleDispatch vs. networkx and cases where both are used), if the output should be printed/visualized/returned in a Python structure, etc.

mrocklin commented 9 years ago

You might want to look at odo.convert.path (alias of odo.core.path, which holds the docstring) and odo.append.resolve, which together handle most of the internal path finding. I'm not sure of the best way to present this information to the user but agree with you that this would be valuable.

dan-coates commented 9 years ago

In the short term, I'm building this with print statements. In the long term, I don't think dumping a bunch of information via print is a good way for a widely-used functional package like this to behave, so I'm thinking of building a Simulation object that would be returned to the user from the odo function rather than the data object usually created, which would contain both an str depiction that is useful if the user just wants printed information, as well as structured information about append and convert paths that would be useful for more automated testing or analysis about the paths odo takes.

As a first step though, I'm going with print statements just to make it work and understand what elements of the path are potentially useful and how they should be presented.

dan-coates commented 9 years ago

First draft of simulation prints here. Will try to organize that better in an object with a prettier print representation.

dan-coates commented 8 years ago

I've gotten more experience with test-driven development lately that has me wanting to pick this up again. odo seems a bit difficult to use in a TDD framework because it does quite a bit of magic behind the scenes.

One typical testing pattern when developing a new backend would be to create a function that does a conversion or append, testing it in isolation outside of odo. Then the integration with odo can be tested by registering that function, calling odo, and verifying that the newly registered function was called with the expected arguments, using a mock for the new function so you can inspect it and not have it run.

I don't think this approach will really work with odo though, because odo may do a whole lot more behind the scenes. It may call resource or discover, and the path from your source to target may go through an arbitrary number of other hops which you don't want actually called and shouldn't have to mock out.

I have some thoughts on possible solutions for another comment.

dan-coates commented 8 years ago

For convert calls, the solution could be relatively straightforward, as every call to convert first builds up the path from the NetworkX graph, then steps through it, calling the functions. The result that path returns is good information from a testing/debugging standpoint, giving the source type, target type, and function to be called for each conversion hop. That itself could be returned to the user to inspect or interact with. We could also go a step further and include some test utils that allow a user to pass a function and kwargs in and assert that the function passed is present in result and that it would be called with the given kwargs, or to call various inspect functions to show the module or code that would get run for debugging purposes. There could also be a test function to express more cleanly if a path exists at all rather than just letting NetworkX raise networkx.NetworkXNoPath.

dan-coates commented 8 years ago

For append calls, it's a little more complicated. On the one hand, append's Dispatcher has a dispatch method that nicely returns the function that it will dispatch. So it's quite straightforward to see what will run and inspect/verify that call like I mentioned for convert. Unfortunately, many append functions are something like "append_anything_to_x" and just call append again with an object returned from convert. I'm not entirely sure what can be done about that situation. You could perhaps try patching convert and append with dummy functions that will return simulated results, but that doesn't seem safe to execute as the append function may run other things you don't want run in a simulation mode.

You could say that only getting back the function that is registered with append is good enough. For someone unit testing a new backend that's probably true and it may be sufficient to know that the function you registered gets called as expected - the testing of the internals of that function itself, including any embedded append or convert calls can be left to the person developing that backend. But this doesn't seem ideal for debugging or for situations where you need to confirm the path that a conversion will take. This may be another opportunity for test utilities that can help parse out the embedded convert or append calls and perhaps continue down the path. It's also very possible there's another more creative solution I'm not thinking of.