SciRuby / daru

Data Analysis in RUby
BSD 2-Clause "Simplified" License
1.03k stars 139 forks source link

The idea of integrating main types of distribution in gem #359

Closed wowinter13 closed 7 years ago

wowinter13 commented 7 years ago

In Python we can easily create vectors with normal distribution in one string using numpy:

import numpy as np
s = np.random.normal(mu, sigma, 10)

While in Ruby we should use Distribution gem:

require 'daru'
require 'distribution'

rng = Distribution::Normal.rng(mu, sigma)
Daru::Vector.new(1000.times.map {rng.call})

Distribution gem contains a lot of ways for normalization. And it's a little overhead to include it if we just want to use basic functionality. Because normal Gaussian distribution is often used as a basic function for a lot of statistic and ML tasks, I think it will be a useful idea to add some interface to work with the most popular distribution (as fact, it's a normal distr) and make it work in the way like this:

Daru::Vector.new(1000, normal[1, 10]) #your syntax can be here

Correct me if that issue is more suitable to Distribution gem rather Daru. Or maybe we just can make this syntax less complex.

I can realize it if community will thumb up this idea!

parthm commented 7 years ago

In the python example I see numpy as the distribution gem equivalent. Something like pandas would be the Daru equivalent.

Daru::Vector supports array or range arguments to new for easy creation. I would think if the distribution gem provided an easy way (alias/shortcut) of creating the distribution in the as an array it might significantly simplify vector creation. E.g. Daru::Vector.new(Distribution::Normal.rng(mu, sigma).take(1000)). Don't know if it does something like this already. This might make it easier to use in general rather than the specific context of Daru::Vector. It might be a good addition to the distribution gem.

Purely as a user of Daru, IMHO Daru should do DataFrame and Vector really really well and iron out any wrinkles before looking into tighter integration with other gems. Perhaps, the core team can comment.

zverok commented 7 years ago

TBH, I don't see any value in bundling. It is how proper Ruby gems work: they solve one task, and solve it well (or die trying). We are not building some "framework for everything" here. You almost can make your code simpler this way:

rng = Distribution::Normal.rng(mu, sigma)
Daru::Vector.new_with_size 1000, &rng

...except for the fact Daru expects block to accept 1 argument. This syntax working should be a reasonable improvement request ;) But probably even to Distribution gem, not Daru, because in my head, this should work too, without any Daru:

Array.new(100, &rng)

...but it does not.

wowinter13 commented 7 years ago

Okay, thank you, guys! I will rethink how to fix &rng & other methods of normalization in Distribution and PR solution in the near future.

arbox commented 7 years ago

Actually if one needs only the Normal distribution and sees the whole bunch of other implementations as overhead why not to extract the code and build a targeted normal_distribution gem? That's the solution which makes things simpler in my view of this world :)

I think it's the way to build interoperable software packages and if it's required we can always build a meta-gem which combines everything and installs everything.

v0dro commented 7 years ago

Doesn't make sense to bundle distribution with daru. Close this issue?

wowinter13 commented 7 years ago

Yea, close this issue.