Lab 6 table duplicates - Githubissues

DS4PS / cpp-526-sum-2021

Coure shell for CPP 526.

https://ds4ps.org/cpp-526-sum-2021/

MIT License

1 stars 3 forks source link

Lab 6 table duplicates #36

Open ctmccull opened 2 years ago

ctmccull commented 2 years ago

So for lab 6, I've been able to get everything working properly. The only issue is that when I make the final table, it doesn't just give one case per year per team. Is this to be expected? Everything is still in ascending order, but I was just curious if it is truly supposed to match the preview with only one row for the team and year followed by another row for the team and year.

TeamSalaries %>% filter(n >= 25) %>% group_by(teamID, yearID) %>% arrange(cost.per.win) %>% select(yearID, teamID, lgID.x, Rank, G, W, n, team.budget, cost.per.win)

lecy commented 2 years ago

A good sanity check is figuring out what size the data frame should be to see if you merged the data correctly.

How many teams do you have? And how many years of data? For example:

50 teams x 100 years of data = 5,000 rows

It's a little more complicated because you have lots of teams that are around a short time, but it should give you some sense of magnitude. Looks like you have a lot more rows than that.

I suspect you did not merger correctly or did not group correctly while calculating team stats.

dholford commented 2 years ago

@ctmccull, I had a similar problem. When I finally figured it out, the issue for me was forgetting to use c() since I was using two merging keys. I think the directions explicitly show using c() for multiple IDs with merge() but not with join(), so I think I just assumed I didn't need it. I was wrong.

Hopefully that helps!

kidistbetter105 commented 2 years ago

@ctmccull I have same issue. I use merge() key to merge the data but I get overwhelming error.

Sanaz-27 commented 2 years ago

Hi @ctmccull, I had the same issue and got frustrated to figure out what I did wrong :( then checking the Merged Data I saw that they are grouped by many columns but at the same time many others have unique values,

Capture

so using:

unique() %>% head(25) %>% pander()

solved everything, and got the result same as the table in the Lab 👍