clbustos / statsample

A suite for basic and advanced statistics on Ruby.
http://github.com/clbustos/statsample
BSD 3-Clause "New" or "Revised" License
402 stars 96 forks source link

Remove bad check in cronbach_alpha calculation #11

Closed jeremyevans closed 9 years ago

jeremyevans commented 12 years ago

There is no reason to return nil if a single vector has a 0 variance. For example, let's say you are giving a test, and every single taker gets the easiest question correct. The variance for that question vector is 0, but Cronbach's alpha can still be calculated correctly for the entire dataset.

clbustos commented 12 years ago

Is a design decision. You could calculate alpha using all variables with variance>0, but you should give an advice to the user first. Because Statsample is a in-program library, not a REPL one, I decided to be strict with this one. Maybe a more general function should be created, that will be resilient to problems on input.

jeremyevans commented 12 years ago

I guess I don't understand why having a vector with variance = 0 indicates any problem with the input. It seems to be to be a normal situation that the library can and should handle. Now if all vectors have variance 0, I can see returning nil. Maybe change any? to all?

clbustos commented 12 years ago

From a psychometric perspective, one or more items with 0 variance is very serious, because implies a bad selection of items. The meaning of the index (lower bound of correlation for equal size tau-equivalen measurement) not longer apply directly, because the library omits one or more variables. So, I should give a warning about it. Anyway, as R does, I can provide an option to relax requirements, as na.rm on mean function.

jeremyevans commented 12 years ago

I'd say that what you said is true for large datasets. In my case, I was calculating alpha from a small dataset (16 takers), and there were multiple questions that everyone got right. Since alpha can be calculated correctly even if some vectors have variance = 0, I don't see the reason to purposely refuse to calculate it. It should be up to the user to determine the meaning of the result, the library's responsibility is just to perform the calculation.

At the very least, if you are going to refuse to calculate alpha because of artificial restrictions, please raise an error with a descriptive message indicating why. Returning nil is bad as it doesn't indicate why the calculation was not done. When I first used the library and got nil, I thought I was doing something wrong, and it caused quite a bit of extra debugging time.

clbustos commented 12 years ago

Ok, you convince me. I will put an option to raise a error (strict mode), but we should delete any vector with variance=0.

agarie commented 9 years ago

Hi @justin808, thanks for the pull request! We're currently in the process of centralizing SciRuby's gems in the organization repositories. Can you reopen your PR on sciruby/statsample?

Thanks! I'll take a look at your PR as soon as I finish moving the other gems' issues there. :)

justin808 commented 9 years ago

Hi @agarie How do I reopen my PR? Can you please give me a link?

agarie commented 9 years ago

Hey @justin808, I found a page on the documentation showing how to change to which repository you send the PR to: https://help.github.com/articles/using-pull-requests/#changing-the-branch-range-and-destination-repository.

So, if you close these two, you probably can create new PRs pointed to SciRuby/statsample.

justin808 commented 9 years ago

https://github.com/SciRuby/statsample/pull/9

jeremyevans commented 9 years ago

This was fixed in the new upstream, so it can be closed now.