cs50 / problems

Checks for check50
145 stars 237 forks source link

issue 258 fix #266

Closed patrickthornton closed 3 months ago

patrickthornton commented 3 months ago

forces students to recognize that DISTINCT(name), were it included in the query, would be uncalled for, as two or more actors from 2004 movies could have the same name but be different people. i could see the logic behind not making this change, also; it's a bit of a minor change

niekcode commented 3 months ago

After the change, Check50 does not match the following specification anymore: If a person appeared in more than one movie in 2004, they should only appear in your results once. The following count in the How to Test section also needs to be adjusted: Executing 9.sql results in a table with 1 column and 19,325 rows

Potential fixes:

  1. Keep the changes, in the How to Test section change the rows to 22,746 and remove the specification mentioned above.
  2. Revert the changes, change the number of rows in the How to Test section to 19,261 and maybe add something along the lines of assume that every name belongs to a unique actor to the specifications to make it more clear to use DISTINCT even though different actors with the same name exist.
  3. Hint to the user that GROUP BY people.id could be used to only show actors based on their id once while also showing different actors with the same name, and check accordingly. I think this would be the most beneficial for the user and how it was originally intended but I couldn't come up with an implementation for this in the Check50 function, maybe someone else does.
curiouskiwi commented 3 months ago

It does still meet the spec. A person is not defined by their name alone. So in this updated test, we are confirming that if there are 2 people with the same name (ie, 2 people in the people table who happen to have the same name), that both will appear in the results. A common error is using DISTINCT(name) even though names are not distinct. This edit catches that error.

niekcode commented 3 months ago

Oh, okay, thanks for clarifying. I just found it odd that the test passed even when an actor starred in two movies and was printed twice in the results, which seems contrary to the specification above. That's why I came up with these solutions.

curiouskiwi commented 3 months ago

even when an actor starred in two movies and was printed twice in the results

In the database that check50 is using, Emma Watson is the name of two different people. Perhaps that's the confusion?

niekcode commented 3 months ago

SELECT name FROM people JOIN stars ON people.id = stars.person_id JOIN movies ON movies.id = stars.movie_id WHERE year = 2004 ORDER BY birth ASC;

What confused me is that this code passed the test even though it's printing the same actors multiple times if they starred in more than one movie. I was just trying to find a solution where this edge case would be caught by Check50. Sorry, I probably should have mentioned this earlier.

patrickthornton commented 3 months ago

nice catch! the good news for me is that it looks like this issue existed before this merge; the smaller version of the movies.db database used here doesn't catch the case where the same person is listed twice or more in 9.sql. the solution you give shouldn't be accepted, i believe, since the row count is so markedly different from the one we expect. i'll make the change!