fivethirtyeight / data

Data and code behind the articles and graphics at FiveThirtyEight
https://data.fivethirtyeight.com/
Creative Commons Attribution 4.0 International
16.69k stars 10.96k forks source link

How to combine two sets of data with differences in merge-index strings? #297

Closed Montana closed 1 year ago

Montana commented 2 years ago

Hey folks,

So my main question is basically how can I norm the team names in the two sets easily, without having to analyze all the differences "by hand" and hardcode "replace"-operations on one of the sets?

Dataset1 is downloadable here: https://data.fivethirtyeight.com/#soccer-spi. Dataset2 is not available freely, but it looks like this:

Manchester United   Leicester   2018-08-10 22:00:00 0.2812      0.3275      0.3913      1.5137  1.73813 
jayb commented 1 year ago

The best approach is generally to have a canonical name for each team, and a lookup table (probably stored in a .csv file) that maps alternate spellings to your canonical names. Then you use that table to link datasets.