Create random variables by default rather than distribution

This issue builds upon discussion from #243.

This is a suggestion to streamline and make more user friendly the user interface of CUQIpy by creating a random variable when the user writes code like:

from cuqi.distribution import Gaussian

x = Gaussian(0, 1) # or 
x = cuqi.distribution.Gaussian(0, 1)

Instead of creating an instance of Distribution, the proposal is to change the __new__ method of Distribution to actually return an instance of a RandomVariable. This can be done via the following change to Distribution:

    def __new__(cls, *args, **kwargs):
        """ Overload __new__ to return a random variable """
        dist = super().__new__(cls)
        dist.__init__(*args, **kwargs)
        return dist._as_random_variable()

Changes to user interface

High-level interface

By default random variables are created making the user interface follow the mathematical notation more closely:

$$ \begin{aligned} d &\sim \mathrm{Gamma}(1, 10^{-4})\ x &\sim \mathrm{Gaussian}(0,d^{-1})\ y &\sim \mathrm{Gaussian}(A(x), 1) \end{aligned} $$

in CUQIpy:

d = Gamma(1, 1e-4)
x = Gaussian(0, 1/d)
y = Gaussian(A(x), 1)

The random variables can then be passed as input to create a BayesianProblem or to JointDistribution.

Low-level interface

At a lower level the Random Variable contains the distribution inside of it. Certain "main properties" are exposed such as

x = Gaussian(0, 1)
x.sample()
x.logd()  # Only possible for simple RVs (not transformed)
x.dim
x.geometry

The underlying distribution can be accessed to get more information e.g.

x.distribution.mean # mean of Gaussian
x.distribution.cov # covariance of Gaussian

Certain methods exists for acting on random variables. These are:

Conditioning

y = Gaussian(0, x)
y.condition(x=10) # Returns now RV where x=10 is fixed inside underlying distribution

Evaluation

x = Gaussian(0, 1)
y = Gamma(1, 1e-4)
z = x+2*y
x(10) # Returns 10
y(10) # Returns 10
z(x=10, y=10) # Returns 30

Algebra

Algebra as shown in #243 works on RVs

Changes to underlying framework

The main change is to have distribution __new__ method return a RandomVariable. In addition we have to update many demos etc, that work directly with distributions to extract the distribution from the initialized random variable.

Changes to name

.name is currently used by distributions to keep track of the defined name of the distribution. This is used by JointDistribution, Samplers and in various other areas. This should be updated such that .name is part of RandomVariable and in turn a hidden ._name method to get name can be stored in distributions at initialization time if that makes it simpler for the code.

Changes to how conditioning looks for the user

We have to decide what to show the user when conditioning. Currently we have

x = Gaussian(0, lambda s: s)
x(x=10) # returns a likelihood function

Now (suggested)

x = Gaussian(0, lambda s: s)
x.condition(x=10) # Should return what? RV again?

One suggestion is to return a RandomVariable or maybe define/call it observed RandomVariable where the data and distribution is stored, perhaps via our current Likelihood class.

Final point is if we fully condition a random variable. Then we end up in the situtation similar to the EvaluatedDensity. Thus do we need a EvaluatedRandomVariable?

Since much of the functionality is implemented for distributions it would be good to leverage that and keep the changes to RandomVariable minimal and instead simply use the underlying distributions to create the abovementioned behavior*

Thank you for the thorough overview and suggestions based on our recent discussions.

Some questions to consider:

Would it be beneficial to have separate classes, e.g., simple rv and transformed rv, and how should they be related? If their behaviour is slightly different, eg, simple rv has the logd method, but transformed doesn't, that suggests separate classes to me, maybe simple subclassed from the more general transformed rv, adding the logd method?
Evaluation of expressions like z = x + 2*y, is stated to have z(10) return 10. What is the motivation for that? Seems to assume y is set to zero to get that. Would seem more intuitive to me to return another rv only depending on y, perhaps with the x node being in "evaluated rv, set to 10" mode somehow.
Accessing the underlying distribution for transformed variable like z that depends on two rvs doesn't seem possible? Even for w=2*x or w=x+1 we wouldn't immediately have the distribution of w even if we have it of x? So perhaps only simple rvs have distribution?
Conditioning in our original distribution sense and computing evaluating a transformed rv given the value of rvs it depends on still seem very similar to me. I would like to understand better whether there is absolutely a need for separate syntax. For example y = Gaussian(0, x) as in the example above. Can't we interpret x as an rv, so that y is somehow a transformed rv, though not through simple algebraic combinations on x. While the value of y does have an additional layer of randomness since it is a distribution, we still essentially by y(x=10) mean "give me y with x fixed at 10".

CUQI-DTU / CUQIpy