Closed mrocklin closed 10 years ago
Mostly I'm just curious about this approach. Once we can discover on basic types then blaze.data descriptors will rely on this functionality after doing basic parsing on a subset of their data. Discover will also be extended to work on numpy arrays and pandas dataframes.
This now supports datetimes with dateutil.parsers.parse
and numpy with datashape.from_numpy
.
I've removed the WIP label. A lot of the work now needs to happen in blaze.data
.
Looks like dateutil is non-standard. I'd still like to go ahead with it for now, adding this as a dependency. It seems to be in the main anaconda distribution though, so it's somewhat-standard.
Looks like there is an issue with dateutil
. On conda it's named dateutil
while on PyPI it's named python-dateutil
.
pandas uses dateutil for this stuff, using it will at least match behavior people are used to from there.
OK, I've pushed up the change so that python-dateutil
is in requirements.txt. This means that I'm preferring PyPI over conda. We can't support both automatically from a single requirements.txt
. We'll need to either drop dateutil
or specialize our build scripts.
@mwiebe should I use an assert_equals
function? If so from where should I import it? The tests have been nose/py.test agnostic so far, should we select one?
Also, here is a page showing pytest
magic. They must inspect assert
statements and generate other code.
The assertion rewriting stuff looks great, another reason to switch to pytest. With that working, I would favour your preferred assert x == y syntax.
This is again ready for review. @mwiebe you're probably the best candidate. I put comments in a couple places to direct your attention. You might also want to try it out on a dataset or look at the results from kiva
now in the PR header.
+1 LGTM
@mwiebe I added your test and the bool/int/string relationship. This also exposed an error with my current system. Your test was good because the result was neither of the input types.
OK, merging this. I expect that we'll run into issues, but we won't know until we try.
Whoops. Forgot to add note about trouble cases.
We can handle missing data in a variety of cases as well. Here is the result of calling discover on the
kiva_tiny
dataset living inblaze/samples/server/arrays/lenders.json