42-AI / bootcamp_machine-learning

Bootcamp to learn basics in Machine Learning
Other
202 stars 41 forks source link

Examples missing a feature (`theta = n` but should be `theta = n + 1`) #102

Closed Antoine-lb closed 2 years ago

Antoine-lb commented 2 years ago

Hello 👋

The example is using a len(theta) = n but should be len(theta) = n + 1 because after intercepting x it with a column of 1s the two matrices are incompatible.

The current example:

x = np.array([
        [ -6,  -7,  -9],
        [ 13,  -2,  14],
        [ -7,  14,  -1],
        [ -8,  -4,   6],
        [ -5, -9, 6],
        [ 1, -5, 11],
        [ 9,-11, 8]])

theta1 = np.array([3, 0.5, -6]) # missing a value (feature) here

What it should look like:

x = np.array(...)

theta1 = np.array([0, 3, 0.5, -6]) # adding a 0 (or any value) as the first feature would solve the problem 

Note: the two examples have the same problem

Disclaimer: I'm new to this so maybe I got something wrong

Antoine-lb commented 2 years ago

I can do a pull request myself if you want to

madvid commented 2 years ago

No need, I have to update all the days, there are multiple corrections already made in another git repo, I just have to do it on dev and do a merge request

Antoine-lb commented 2 years ago

ok, I'm doing the bootcamp right now so I have many errors written down on the side, hit me up if you want some help

madvid commented 2 years ago

If you are still willing to help, could you do a check list in an issue with all the erorrs you encountered ? Let me know if you are doing it, i'll will do the update of the subjects today.

madvid commented 2 years ago

@Antoine-lb , tu as la liste des erreurs sous la main ou pas ? (si non ce n'est pas grave, s'il y a des oublis de ma part les gens feront des issues)

Antoine-lb commented 2 years ago

Sadly I stoped writing them down after your initial comment. For how long are you open for changes? we are at Agritech so many are doing the piscine, I can take care of sending errors or even pull request them with fixes if you want to

Antoine-lb commented 2 years ago

Not sure if is an error but I think you should check it:

Expected:

[21.0342574, 587.36875564]
[58.86823748, 2229.72297889]

What we got:

[-19.034257402000005, -586.668755635948]
[-57.868237476000004, -2230.1229788875144]

Many in here had the same result, maybe we are wrong but we ended up moving forward after not finding what was wrong.

Here is my code:

def cost_(y, y_hat):
    """Computes the half mean squared error of two non-empty numpy.ndarray, without any for 􏰀→ loop. The two arrays must have the same dimensions.
    Args:
      y: has to be an numpy.ndarray, a vector.
      y_hat: has to be an numpy.ndarray, a vector.
    Returns:
      The half mean squared error of the two vectors as a float.
      None if y or y_hat are empty numpy.ndarray.
      None if y and y_hat does not share the same dimensions.
    Raises:
      This function should not raise any Exceptions.
    """
    return (1 / (2 * y.shape[0])) * np.dot(y_hat - y, y_hat - y)

def add_intercept(x):
    """Adds a column of 1's to the non-empty numpy.ndarray x.
    Args:
      x: has to be an numpy.ndarray, a vector of dimension m * 1.
    Returns:
      X as a numpy.ndarray, a vector of dimension m * 2.
      None if x is not a numpy.ndarray.
      None if x is a empty numpy.ndarray.
    Raises:
      This function should not raise any Exception.
    """
    a1 = x
    if (len(x.shape) == 1):
        a1 = x.reshape(x.shape[0], 1)
    an_array = np.insert(a1, 0, [[1]], axis=1)
    return an_array

def predict_(x, theta):
    """Computes the vector of prediction y_hat from two non-empty numpy.ndarray.
    Args:
      x: has to be an numpy.ndarray, a vector of dimension m * 1.
      theta: has to be an numpy.ndarray, a vector of dimension 2 * 1.
    Returns:
      y_hat as a numpy.ndarray, a vector of dimension m * 1.
      None if x or theta are empty numpy.ndarray.
      None if x or theta dimensions are not appropriate.
    Raises:
      This function should not raise any Exceptions.
    """
    intercepted_x = add_intercept(x)
    return (np.dot(intercepted_x, theta))

def simple_gradient(x, y, theta):
    """Computes a gradient vector from three non-empty numpy.ndarray, without any for-loop. 􏰀→ The three arrays must have compatible dimensions.
    Args:
      x: has to be an numpy.ndarray, a vector of dimension m * 1.
      y: has to be an numpy.ndarray, a vector of dimension m * 1.
      theta: has to be an numpy.ndarray, a 2 * 1 vector.
    Returns:
        The gradient as a numpy.ndarray, a vector of dimension 2 * 1.
        None if x, y, or theta are empty numpy.ndarray.
        None if x, y and theta do not have compatible dimensions.
    Raises:
        This function should not raise any Exception.
    """
    res = [0, 0]
    res[0] = 1 / len(x) * np.sum(predict_(x, theta) - y)
    res[1] = 1 / len(x) * np.sum((predict_(x, theta) - y) * x)
    return res
madvid commented 2 years ago

Just open issue if you find other mistakes, or if you wish to do a pull request you have to create a new branch and pull request on dev when you have correct the mistakes it and regenerate the corresponding pdf. So it is better if I do this I think

madvid commented 2 years ago

For the exercise on simple gradient, the errors in the examples was already fixed have been corrected in the dev branch. It will be very soon on master

madvid commented 2 years ago

For:

Module07/ex06 = declared variables and used variables are different in the examples (for example x is then called X2) so the example does not even compile

well spotted, there were no issue reporting this error. Hopefully python is not a compiled language but an interpreted one ;)

madvid commented 2 years ago

Thank you for the report about the issues, there are fixed in the branch (correction_before_session_2021). Pdf will be regenerate and then merge request will follow on dev and next on main