abhiamishra / ggshakeR

An analysis and visualization R package that works with publicly available soccer data
https://abhiamishra.github.io/ggshakeR/
MIT License
113 stars 14 forks source link

[Feature Request]: UEFA-compliance / autotype compliance #107

Open abhiamishra opened 2 years ago

abhiamishra commented 2 years ago

Suggest an idea or something that can be improved.

Currently, our functions are compliant to StatsBomb and Opta. UEFA-styled coordinates do exist today, new news for me, and it would be a nice thing to have that implemented.

A nicer thing would if we were able to understand the type of the dataframe automatically,,,,, food for thought!

harshkrishna17 commented 2 years ago

auto-identify dataset type is an interesting idea, but we should probably fix the bugs for the current version and do a CRAN release so that we're free to experiment with the idea on dev

harshkrishna17 commented 2 years ago

For now, we could just add a way to transform UEFA data to opta within the guide itself?

abhiamishra commented 2 years ago

Either that or we add UEFA as a data type to all functions

abhiamishra commented 2 years ago

Or we create a helper function that transforms datasets from UEFA to one of Opta/StatsBomb

harshkrishna17 commented 2 years ago

Would {ggsoccer} have this functionality? It has a pitch type pitch_international for UEFA coordinates but not sure if its there for the rescale function.

harshkrishna17 commented 2 years ago

Here's a simple internal function we could create to auto identify the dataset type. We can add in the final coordinate value specifications as well. @abhiamishra @Ryo-N7

data_type_identify <- function(data) {

  if (max(data$x) > 98 & max(data$x) < 102 &
      max(data$y) > 98 & max(data$y) < 102) {
    print("opta")
  } else 

  if (max(data$x) > 118 & max(data$x) < 122 &
      max(data$y) > 78 & max(data$y) < 82) {
    print("statsbomb")
  }

}
harshkrishna17 commented 2 years ago

Updated function

data_type_identify <- function(data) {

  data <- data %>%
    select(x, y, finalX, finalY) %>%
    na.omit()

  if (max(data$x) > 98 & max(data$x) < 102 &
      max(data$y) > 98 & max(data$y) < 102 &
      max(data$finalX) > 98 & max(data$finalX) < 102 &
      max(data$finalY) > 98 & max(data$finalY) < 102) {
    data_type <- "opta"
  } 
  else 

    if (max(data$x) > 118 & max(data$x) < 122 &
        max(data$y) > 78 & max(data$y) < 82 &
        max(data$finalX) > 118 & max(data$finalX) < 122 &
        max(data$finalY) > 78 & max(data$finalY) < 82) {
      data_type <- "statsbomb"
    } 
  else

    if (max(data$x) > 103 & max(data$x) < 107 &
        max(data$y) > 66 & max(data$y) < 70 &
        max(data$finalX) > 103 & max(data$finalX) < 107 &
        max(data$finalY) > 66 & max(data$finalY) < 70) {
      data_type <- "international"
    }

  return(data_type)

}
rithwikrajendran commented 2 years ago

Suggest an idea or something that can be improved.

Currently, our functions are compliant to StatsBomb and Opta. UEFA-styled coordinates do exist today, new news for me, and it would be a nice thing to have that implemented.

A nicer thing would if we were able to understand the type of the dataframe automatically,,,,, food for thought!

Can't we use the uniqueness in the column names of each data frame type to find out the type of data (UEFA, Opta, SB etc) and then return the pitch accordingly?

harshkrishna17 commented 2 years ago

Suggest an idea or something that can be improved.

Currently, our functions are compliant to StatsBomb and Opta. UEFA-styled coordinates do exist today, new news for me, and it would be a nice thing to have that implemented. A nicer thing would if we were able to understand the type of the dataframe automatically,,,,, food for thought!

Can't we use the uniqueness in the column names of each data frame type to find out the type of data (UEFA, Opta, SB etc) and then return the pitch accordingly?

That cant work unfortunately as we already require the dataset to be inputted into the functions with specific column names

robbiejdunne commented 2 years ago

0734bf5c-94b6-4c81-bd18-2e9424120c94

I don't fully understand why the passplot functions scales them fine but the pass networks don't scale properly. The prog passes one was done with the same dataset as the pass network.

601904c1-6c85-4698-8f11-b2693f41e9b4

harshkrishna17 commented 2 years ago

Hey @robbiejdunne could you please be a little more specific? As in what is the problem with scaling. I would also recommend you to create a new issue for this problem, and continue the conversation there

robbiejdunne commented 2 years ago

This is part of the original issue with international/opta/statsbomb data. The pass network's dimensions only go as high as 68 in this dataset. But they seem to be just fine for the progressive passes. Does that make sense?

harshkrishna17 commented 2 years ago

I dont think they're fine for the progressive passes, rather the size of the pitch in plot_pass is making it appear right when it might not be.

Try running this on your dataset before running the passnet function.

data <- data %>%
  mutate(x = x * 105/100,
         finalX = finalX * 105/100,
         y = y * 68/100,
         finalY = finalY * 68/100)

This should make it work

abhiamishra commented 2 years ago

I'm going to play around with the function Harsh made

abhiamishra commented 2 years ago

function seems to be working well.

im going to add a sample UEFA dataset and push the auto-type compliance. Will take some time because I have to change tests as well,,,

abhiamishra commented 2 years ago

quick fixes that need to be implemented: for some functions, they only look at x,y and we have, in our guides, the user selecting only x and y. As such, the function needs to be updated to be able to find the type based on just the (x,y).

abhiamishra commented 2 years ago

I am going to start working on this week, test out a much more improved functionality so it can account for the case I referenced above ^