SciRuby / daru

Data Analysis in RUby
BSD 2-Clause "Simplified" License
1.04k stars 139 forks source link

Fixes build error #306

Closed athityakumar closed 7 years ago

athityakumar commented 7 years ago

The eq function in the Daru::Vector#is_values has been designed to ignore the missing data like nil and Float::NAN to make where support empty data - hence, the nil variables in the test as per commit 8a4bb1cadc5d05600025f186d5c15aef74ebcdd1 is omitted by where, which means that the empty space is filled by another nil by default.

Rather, include? has been chosen to replace the eq function to fix the Travis CI build error.

Ping @zverok @lokeshh @v0dro

v0dro commented 7 years ago

@lokeshh please review this?

athityakumar commented 7 years ago

@lokeshh : PR is ready for review now. :smile:

athityakumar commented 7 years ago

@lokeshh : Can you please review this PR soon? Other PRs are having their builds fail without this fix.

lokeshh commented 7 years ago

I'm really sorry. I totally missed that notifications. Will review it soon.

lokeshh commented 7 years ago

I haven't been following the conversation so I'm unable to understand the situation here. Why changes in is_values are needed to make build pass? Is this function being used in other places? Also could you provide the link to the test which is failing. Thanks

athityakumar commented 7 years ago

@lokeshh : Sure. Also, now - every other PR being submitted is failing the Travis CI build due to this.

Cause of Build Error

In commit 5868487aa60a4f3361a67768409240a890fe8826 , is_values? function was added as follows.

def is_values(*values)
    Daru::Vector.new values.map { |v| eq(v) }.inject(:|)
end

And this works fine. However, in the next commit - Support for the where clause was merged. And this is the change done (in PR #302 and subsequently commit 8a4bb1cadc5d05600025f186d5c15aef74ebcdd1):

def apply_scalar_operator operator, data, other
          BoolArray.new data.map { |d| !!d.send(operator, other) unless MISSING_VALUES.include? d }
 end

So, now - the is_values function passes the block to eq function - which passes on to apply_scalar_operator. Now, this apply_scalar_operator ignores missing data and compares only for existing data. So, the missing data are ignored and passes back to the is_values function. But, as that was initialized as a Daru::Vector of size 5 (in the test), and a size 3 vector is given in the output of the test, the other two are stored as nil by default - to fill the empty slots.

Solution

To keep out the apply_scalar_operator block, include? has been chosen to replace the eq function to fix the Travis CI build error.

Yes - the apply_scalar_operator is used every time when comparison operators like lt, mteq, etc. are being used. However, this is the only time when the Daru::Vector has been done by mapping (to keep the size constant). So, there shouldn't be issues with other functions.

Links

Travis CI failing build : https://travis-ci.org/SciRuby/daru/builds/199990937

Sample case of failing build : https://travis-ci.org/SciRuby/daru/jobs/199990938#L1053

lokeshh commented 7 years ago

This line here:

          BoolArray.new data.map { |d| !!d.send(operator, other) unless MISSING_VALUES.include? d }

By MISSING_VALUES.include? d if you meant to check if d is Float::NaN then this is wrong. See below:

[4] pry(main)> Daru::MISSING_VALUES.include? Float::NAN
=> false

Instead user include_with_nan? private method for same purpose.

Also I think functions like eq should return false where they are returning nil. @v0dro What do you say?

On a whole I think following should be the solution:

!include_with_nan?(d) && !!d.send(operator, other) 
athityakumar commented 7 years ago

@lokeshh @v0dro : But, the !!d.send(operator, other) would result in error as mentioned in issue #246 , right? To keep out from applying comparisons on the missing data, I had kept it as -

          BoolArray.new data.map { |d| !!d.send(operator, other) unless MISSING_VALUES.include? d }
lokeshh commented 7 years ago

But, the !!d.send(operator, other) would result in error as mentioned in issue #246 , right?

No it won't get executed because !include_with_nan? would first return false.

athityakumar commented 7 years ago

@lokeshh : Got it, thanks. :+1:

Should the !include_with_nan? of Daru::Vector be used via requiring Daru::Vector, or should the function include_with_nan? be copy-pasted onto Daru::Core::Query as a private function?

Also, is there any specific reason why include_with_nan? should remain as a private function in Daru::Vector?

lokeshh commented 7 years ago

You can simply get away with !d.nil? && !!d.send(operator, other) as comparisons are defined for Float::NAN and nil is the only one value which is creating problems.

athityakumar commented 7 years ago

Oh, I had actually started off in my previous PR with nil? case, but changes were suggested to extend it to MISSING_VALUES here. So, just the .nil? check works?

athityakumar commented 7 years ago

@lokeshh @v0dro : Made the suggested changes. :smile:

Now, Daru::Vector#is_value works for single & multiple values, except for nil - ie,

dv = Daru::Vector.new [10, 11, 10, nil, nil]
dv.is_values 10,nil
 => #<Daru::Vector(5)>
     0 true
     1 false
     2 false
     3 false
     4 false 
lokeshh commented 7 years ago

Now, Daru::Vector#is_value works for single & multiple values, except for nil - ie,

That shouldn't happen. Why is that?

athityakumar commented 7 years ago

Because, nil is directly mapped to false with !d.nil?. But this was taken care of, while comparing with .include? rather than involving eq here as per the 2nd commit in this PR.

lokeshh commented 7 years ago

Okay, try this then d.respond_to?(operator) && !!d.send(operator, other).

athityakumar commented 7 years ago

Made the changes, and it works! :smile:

I thought that this wasn't the desired output when I read this. Seems like I had misinterpreted "returning nil" as "comparing nil".

Also I think functions like eq should return false where they are returning nil.

lokeshh commented 7 years ago

No problem. Now just waiting for @v0dro to confirm checking for Float::NaN is not required as I requested here.

athityakumar commented 7 years ago

@lokeshh @v0dro : Any updates on the above discussion thread?

zverok commented 7 years ago

Let's just merge. Broken master is not the best thing in the world.