blaze / datashape

Language defining a data description protocol
BSD 2-Clause "Simplified" License
183 stars 65 forks source link

promote(string, ?string) still returns ?object #214

Closed richafrank closed 8 years ago

richafrank commented 8 years ago

Hello,

After updating to datashape==0.5.2, I had hoped that https://github.com/blaze/datashape/pull/213 would fix my issue, but it didn't. I'd like

>>> promote(dshape("string"), dshape("{a: ?string}").subshape['a'])

to return Option(ty=ctype("string")) but it actually returns Option(ty=ctype("object")).

The special casing added by that PR doesn't get hit because dshape("string") != ctype("string"):

>>> dshape('{a: string}').subshape['a']
ctype("string")
>>> dshape('string')
dshape("string")
>>> dshape('{a: string}').subshape['a'] == dshape('string')
False

i.e. a Mono leaf node in the datashape tree does not compare equal to a DataShape node wrapping that leaf node.

Ideally, I'd like to be able to compare the lhs and rhs here as equal. In talking with @llllllllll, we saw a couple options:

  1. Change DataShape.__eq__ to compare only shape and measure instead of type.
  2. Remove the box when the tree is only a box around the Mono, possibly by short circuiting DataShape.__new__ and returning the single Mono parameter. This looks to require updating blaze as well, since there are checks for whether a DataShape is being passed around, and this change would mean dshape("string") would NOT return a DataShape instance.

Do you agree with the intent here? What's the right path?

kwmsmith commented 8 years ago

I'm able to generalize promote()'s logic around string types to handle your case here without changing datashape's equality semantics, PR forthcoming.

I'm in full agreement with you and @llllllllll that there are fundamental design flaws around datashape comparison tests, etc. I often have to work around this in Blaze's testsuite by adding extra datashape.DataShape(<thing that evaluates to option>) calls to appease datashape's flaky comparison semantics.

Our MO on datashape has been to keep it limping along until a larger redesign and refactor can take place. The DyND project has been refactoring and improving the DyND datashape implementation, and we want the blaze datashape implementation to leverage that work as much as possible. The DyND work is still underway, so we're giving it time to stabilize.

Once the DyND work is finished and I'm able to justify resources, we will take on the datashape redesign work.

kwmsmith commented 8 years ago

OK, looking into it a bit more, I've generalized Mono.__eq__ thusly:

    def __eq__(self, other):
        return (issubclass(type(other), Mono) and
                self.shape == other.shape and
                self.measure.info() == other.measure.info())

Among other things, this generalizes equality as one would expect so dshape('?string') == Option('string'). The current implementation of promote() now works for your case, and I've added your case (and a few others) to promote's tests.

Testing blaze and odo with this change, their testsuites all pass without issue.

richafrank commented 8 years ago

That makes sense to me. That's what I'd been thinking with the first suggestion, though for DataShape instead of Mono - I'll trust your better sense of how general the fix should be.

Regarding the other option, sounds reasonable to leave the extra box nodes in the tree, given that a bigger rewrite is pending.

richafrank commented 8 years ago

Fixed by https://github.com/blaze/datashape/pull/215