Neuraxio / Neuraxle

The world's cleanest AutoML library ✨ - Do hyperparameter tuning with the right pipeline abstractions to write clean deep learning production pipelines. Let your pipeline steps have hyperparameter spaces. Design steps in your pipeline like components. Compatible with Scikit-Learn, TensorFlow, and most other libraries, frameworks and MLOps environments.
https://www.neuraxle.org/
Apache License 2.0
607 stars 62 forks source link

Feature: Add Binomial distributions to hyperparams/distributions.py #428

Closed vincent-antaki closed 1 year ago

vincent-antaki commented 3 years ago

I think it would be a neat addition to generate Discrete-value hyperparams. You could redefine Boolean (which should really be called Bernouilli in my opinion) to be Binomial(1,p).

In the mean time it is possible to use Choice with custom probabilities to model this distribution.

Edit : Side question, lets say that, for my random hyperparameter search, I'd really like to use learning_rate = 10**uniform(-2,-4). That doesn't seem to be an option within the current state of the framework; I'd have to create my own class ExpUniform (which maybe could be already included). Would there be a not too complex way to introduce operator on these distributions instead? Or even if a complex option is on the table, are all attempts at doing this fundamentally overkill and computationally wasteful for the conceptual scope of the distributions in this file?

Eric2Hamel commented 3 years ago

As the Boolean and Bernouilli it depends on what is easier for user. It could be possible in the code to just do Bernouilli = Boolean so the two name are equals.

As for the Bionomial Distribution, you can easily do it using the Choice distribution. If not you can create your Binomial distribution using Choice under the hood.

For exemple:

def Binomial(number, p): choice_list = ... probas = ... return Choice(choice_list=choice_list, probas=probas)

Something similar.

Is there a reason why you want the binomial? Do you want to have access to "number" or "p" or have access to other specific class attributes that is specific to Binomial?

guillaume-chevalier commented 3 years ago

We could probably even do:

Such that there are no breaking changes on the API, and the Bernoulli would be introduced.

I support the point of @Eric2Hamel in how Boolean is ideal and clear for the regular user.

I however doubt the quality of my suggestion above as it may make an excessive use of inheritance in the eyes of some people.

Eric2Hamel commented 3 years ago

Not sure if it is worth inheriting Boolean(Bernoulli) since Boolean is Bernoulli (really the same thing). So not sure the inheritance is worth it.

But I aggree for class Binomial(Choice) or Boolean(Choice) (or Bernoulli(Choice) that it might be worth it, if the goal is to be able to access some specific attribute. That can give your access to those specific attribute while using Choice logic.

Here is an example: `class Binomial(Choice):

def __init__(self, number, proba):

    self.number = number

    self.proba = proba

    choice_list = ... (dependant on number and p)

    probas = ... (dependant on number and p)

    super().__init__(choice_list = choice_list, probas=probas)

`

What do you think @guillaume-chevalier ?

vincent-antaki commented 3 years ago

@Eric2Hamel, I agree just defining Bernouili = Boolean would be sufficient. I mentionned that change because I initialy CTRL-F through that file writing Bernouilli and did not find it. Anyway, different people different intuition, I see no problem to have both.

As for defining Binomial(n,p) as a Choice, I think it's a straightforward way to do it. Requires a bit more computation on initialisation but its faster than simulating the whole thing each time you sample. Wrapping scipy.stats.binom is another alternative we could consider. Anyway, I don't have strong feeling on it one way or the other.

In my specific case, I was only looking for a Bernouilli. However, I could imagine a case where we'd be want to use a Binomial distribution instead of a RandInt in a random hyperparameter search.

Also, side question, lets say that, for my random hyperparameter search, I'd really like to use learning_rate = 10**uniform(-2,-4). That doesn't seem to be an option within the current state of the framework; I'd have to create my own class ExpUniform. Would there be a not too complex way to introduce operator on these distributions instead? Or even if a complex option is on the table, are any attempt at doing this fundamentally overkill and computationally wasteful for the conceptual scope of the distributions in this file?

Eric2Hamel commented 3 years ago

What you are refining to with 10**uniform(-2,-4) is in fact a logUniform (with base 10) distribution. Neuraxle currently have a logUniform but using a base 2 which means 2**uniform(a, b). I think you can easily convert the 10uniform(-2, 4) to one using a base 2 since` 10 = 2(log(10) / log(2)). It will gives something like2* uniform(-23.32192, -4 * 3.32192)` . Just make sure it gives you the right number just in case.

@guillaume-chevalier , is worth adding a base to the logUniform to easily do 10**uniform(a, b)?

vincent-antaki commented 3 years ago

Oh fantastic! Once again, couldn't find it because I wasn't searching for the right name. The current version of logUniform does the job for me. I don't really need it to be in base 10. Cheers!

guillaume-chevalier commented 3 years ago

Nice convo here!

@Eric2Hamel

What do you think [on class Binomial(Choice)] ?

That sounds good to me :smiley:


@vincent-antaki

Would there be a not too complex way to introduce operator on these distributions instead?

I added the issue! I had the solution idea for a while using wrappers and methods in the base class as follow, I only hesitate regarding wheter all the maths can be done with the cdf and pdf functions without too much hassles : https://github.com/Neuraxio/Neuraxle/issues/430

Also using scipy dists in those addition things would be nice! - Or perhaps using scipy may be the only way to properly do the pdf, cdf, std and so forth? I hope the math is possible for these addition wrappers, exp wrappers, and so forth as I advance in #430.


@vincent-antaki

The current version of logUniform does the job for me. I don't really need it to be in base 10.

I thought that the base didn't matter (and also cancels out) as long as the distribution is with the same begin and end numbers? See here how it starts by 1 and ends with 4: I believe that changing the base would have no effect other than changing the numerical stability of the sampling, which is why I've used the base 2 as float numbers like to bitshifft easily with mults or divs with 2 without much change to the actual number being shifted.

Side note: see issue https://github.com/Neuraxio/Neuraxle/issues/429 that I've just created related to the LogUniform and LogNormal distributions.

Eric2Hamel commented 3 years ago

@guillaume-chevalier : Your right, putting it in base 2 or base 10 will still give something uniform. The only thing it matters is how you parametrize the logUniform. If the user things 10(a1) and 10(b1) the value for a and b will be different for 2(a2) and 2(b2). So a1 !=a2 and b1!=b2. But since we do

self.log2_min_included = math.log2(min_included) self.log2_max_included = math.log2(max_included)

And we use these value in the RVS, I agree with you that it changes nothing. The user do not directly put the value a1/b1 or a2/b2 but min = 2(log2_min_included) and max= (2log2_max_included) and here log2_min_include = a2 and log2_max_included = b2. Since we parametrize like that the LogUniform I agree with you that it changes nothing and it changes nothing because we are uniform in the log space. On the LogNormal, it might be the same thing also because we input the log2_mean and not the mean directly and the same thing for the scale. So the way we parametrize it, I think we are independant of the scale as you mention. But if you use scipy stats which are parametrized differently, you have to parametrize it accordingly.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs in the next 180 days. Thank you for your contributions.