SciRuby / daru

Data Analysis in RUby
BSD 2-Clause "Simplified" License
1.03k stars 139 forks source link

Index#dup should copy reference to name too #477

Closed Yuki-Inoue closed 5 years ago

Yuki-Inoue commented 5 years ago

Currently, if idx.instance_of?(Daru::Index), then following statement does not hold:

idx.dup.name == idx.name
# => returns false!!

This is because name is not copied on Daru::Index#dup, which I think is not the expected behavior at least to the library user.

So, this PR fixes it.

Yuki-Inoue commented 5 years ago

ping @v0dro

Shekharrajak commented 5 years ago

I think == should check name of the index as well :

[28] pry(main)> dii = di.dup
=> #<Daru::Index(4): {speaker, mic, guitar, amp}>
[29] pry(main)> dii == di
=> true

This should return false, WDYT?

update

I think we should have #equal method which will compare only the values it contains with same data type (in this case - index array element). Similar method should be present in Vector , DataFrame, MultiIndex to just compare the elements (ignoring column name index name/value). What do you think ?

Shekharrajak commented 5 years ago

This PR looks good to fix the issue mentioned above. But it came up another question : Mentioned in above comment.

Shekharrajak commented 5 years ago

I will wait for 24 hours for any other comments. If fine then I will go ahead and merge.

Yuki-Inoue commented 5 years ago

@Shekharrajak Hi, thanks for the feedback.

The problem fixed by this PR and the concern brought by your comment is, IMHO, a separate concern, so I'd be glad if you go ahead and merge this.

The concern you brought up, although I, myself, haven't encountered such a case where I need to compare the Daru objects based on the content of its data, I can agree that such method will be useful.

About the naming of such method, I disagree with naming the method as #equal. The reason is that in Ruby, following methods are used for equality comparison, and implementing a new method named #equal is somewhat confusing.

equals?(other)   <-> python `is` correspondent
eql?(other)         -> used for objects with `hash` method, for hash equality check
==(other)            <-> python `==` correspondent
===(other)         -> used for `case` matching

If such a method is needed, I'd recommend using a name which does not conflict with the ruby's equality methods. For example, how about data_eq?(other) or content_eq?(other)?

Shekharrajak commented 5 years ago

@Yuki-Inoue , yes we need to discuss about my concern in different issue ticket (Ruby doc about ==, eq ,equal ) and it is different from this PR.

.dup should duplicate the all class attributes. So it should duplicate the name as well. It looks good Vector , DataFrame and MultiIndex already.

This PR is in.