JeffSackmann / tennis_wta

WTA Tennis Rankings, Results, and Stats
217 stars 144 forks source link

WTA match data : commit_a8c11eb_thru_26_nov_2018 introduced duplicated match data. #15

Closed bazzaar closed 4 years ago

bazzaar commented 5 years ago

Hi,

It appears the recent 'commit_a8c11eb_thru_26_nov_2018' update has introduced duplicated match data. All matches from the following tournament 'Main Draw' now occur twice in the file.

wta_matches_qual_itf_2018.csv -------------------------------- WTA Qual ITF, 2018-W-WITF-AUS-10A-2018, Darwin $60K , 2018-09-24

This is not data that was represented prior to the update, it's simply new data that has been added twice.

Initially I thought there were more tournaments that were duplicated, but then realised 'match_num' isn't unique (within a tournament), it needs to be combined with 'round' to be so [the qualifying draw has it's own match numbering, as does the the main draw].

In the case above, I think it's likely just a cut/paste error since the main draw is represented twice, whilst the qualifying draw is absent.

And therefore the flip side to this is that the 'Qualifying Draw' match data is missing.

Hope this helps, bazzaar

JeffSackmann commented 4 years ago

de-duped. I'll add the qualies at some point in the near future.