Gold, Silver and Bronze colour for each timing + Dataset Transformation

rghanty commented 1 week ago

It would be nice to see your lap timings in Silver/Bronze and mine in Gold. Just saying.

SaiUbc commented 1 week ago

Hi @rghanty, That is actually cool! Thanks for your input 🙂

Here are a few things that I'm thinking of right now:

The record_race_results() function in RoommateChampionship.py should possible include a feature that allows the determination of Gold, Silver and Bronze. I think a new column in the csv has to be made in order to assign 'Gold', 'Silver' or 'Bronze', I think it could be one hot encoded in the future but it is fine if you leave it to be categorical.
Ideally it is better that the csv is saved over a google sheet such that it can be accessed over the internet by multiple people at once. I think there is documentation for using the google-sheets api which has to be looked into but I'd be interested in doing that.

Note: There might be scenarios that only a few people have done a championship or a particular race, then maybe a new person does the same time trial event much later. The csv should hence automatically update to assign Gold, Silver and Bronze to that particular player.

Please do let me know if there are any more features you'd like to work on or perhaps discuss here. I have put this issue #14 as a task in progress in the Project Board which I have assigned it to you.

Feel free to fork the main branch and work on the repo, When you do make a PR, remember to link it to this issue so that I can close it. Lets use this issue to discuss anything else here 👍🏻

Thanks, Sai

rghanty commented 1 week ago

Thanks for your input @SaiUbc, in my opinion, the table needs to be restructured. The google sheets is a good idea but first we should thinking of implementing a new structure for the race_results table. Each column should be that of our name and race lap times should be entries for each person.

Let me know if this works for you Rishabh

SaiUbc commented 1 week ago

Hey @rghanty

Appreciate your input and quick response 👍🏻 Here are my thoughts about your suggestions,

Each column should be that of our name and race lap times should be entries for each person.

Correct me if I'm wrong but I think you are requesting this to implement a functionality that assigns Gold, Silver or Bronze looking at the individual drivers column for a particular race. While I might see an advantage to assign it that way, I fear for two things:

Sparse Columns: While we just have three to four maximum participants for now, we might have more in the future. Consider this scenario, we might have one of our friends come home just for a short time and participate in our championship for a few races. In such a scenario, we would be having an entire column assigned to our friend with many empty entries for the races (in the past) our friend didn't participate in. Having a sparse column worries me in terms of storage.
Curse of Dimensionality: While my first and foremost goal is to collect reasonable and good data from this championship, I would be interested in applying a Machine Learning Model as a further scope of this project in order to predict race medal or lap time for an individual. While I agree that our columns are not sufficiently complex or detailed for this feature to be implemented correctly, having designated columns for drivers could lead to Overfitting if we were to implement a ML model.

Note: I appreciate your comment because it has made me revisit the columns of our dataset and has made me consider improvising our dataset further to collect more information for the future scope of this project. This does however mean that we would need to scrap out the data that we have previously collected but I think its worth starting over as the implementation already exists in RoommateChampionship.py and only has to be tweaked slightly to fit our parameters.

Overall these are my thoughts on the adding of individual driver columns, feel free to correct me if I have the wrong understanding about your request and do let me know the rationale behind it. Let me know if you need any clarification from my side 🙂

Thanks, Sai

rghanty commented 1 week ago

Hey @SaiUbc it's nice to see your interest in my suggestion. I'd be happy to answer your queries. Although, I'd like to clarify first why I suggested that we use an approach to keep individual drivers as columns.

1: Readability: In our current table structure each track will be repeated x times for x drivers. Having drivers as columns eliminates this scenario and prevents duplication for better readability.

Ease of Querying: In my suggested layout it is already structured in a way that makes it easy to see each driver's lap time at a glance, but in our current table structure, it is evident that in order to query a certain driver's lap times, grouping or pivoting queries would be required.

My suggestion for the table format is similar to Wikipedia's format for championship tables for instance:

IMG_20240927_152939

If this table had a similar format to the current one, each race would span 20 or more rows for each driver. Given that this reduces the need for extra columns, we will be sacrificing readability and ease of comprehensibility for storage space.

I agree that the entries for new drivers will be blank for some races. But similar to the Wikipedia article, drivers like Ollie Bearman and Franco Colapinto have blank fields for several races. In our case, we can simply denote an empty cell as NA or something else we can choose.

Moreover, to address your concern about overfitting the ML model, we can implement a function to convert the table into a structure similar to our current one just for the training and testing phase.

Let me know if this works for you or if you need additional clarification.

Regards, Rishabh

rghanty commented 1 week ago

I'd also like to make a suggestion to our overall application. Given that we plan to implement this as a championship, we can do something similar to the table in the wikipedia article and add a points system, maybe (+3,+2,+1 for 🥇, 🥈, 🥉)

EDIT: We don't have to record the scores for each track, we can just display the final score in a separate column. For this purpose, I suggest having the drivers as rows and tracks as columns.

Let me know what you think of this suggestion.

Regards, Rishabh

SaiUbc commented 1 week ago

Hey @rghanty,

Thanks for the clarification! That makes a lot of sense. It's nice to have this discussion where we're brainstorming and both reaching consensus while improvising on this project.

I like the Wikipedia article suggestion, and the way they've organized the table structure is very easy to read and storage-friendly! However, there are a few questions I have that we can hopefully discuss to address the parameters of this project:

Additional Feature Columns: Remember that, unlike F1, we have a column for car type (since it is randomized for each track). Now that we are planning to restructure our dataset and start fresh, we might include new features such as weather, track type, tires, etc. I'm curious to understand how you think we would add those values for each driver at each track in your format, where the drivers are rows and tracks are columns.

to address your concern about overfitting the ML model, we can implement a function to convert the table into a structure similar to our current one just for the training and testing phase

Two Datasets: I am curious to know if you were implying that we should have two datasets (in CSV format) with the same values but different dimensions for this project. I don't think that's particularly a good idea, because any addition to one would require updating the other, meaning more compute, and having two datasets would increase the storage requirements as our project scales.

What I completely agree with is your point regarding readability and ease of comprehension. That's why I think a better solution is to ensure that our dataset on Google Sheets (which would use the Google Sheets API to send our CSV data to an online sheet) could be in a different format than our CSV dataset. If we implement this, it would be up to us to decide in what format we want users to view the dataset and what features can/can't be shown.

Given that we plan to implement this as a championship, we can do something similar to the table in the wikipedia article and add a points system, maybe (+3,+2,+1 for 🥇, 🥈, 🥉)

I think a points structure idea is very interesting and I'm definitely on board with it. However, like the table in the link you've provided, its more readable to highlight gold, silver and bronze at the same time calculate total points for each driver in the Google Sheets format as I've mentioned previously.

Please do let me know if my suggestions make sense and let me know if there are any queries from your side.

In the meantime, I want to encourage you to look over the CONTRIBUTING.md file as a reference for your next steps when you eventually start working on this!

Thanks, Sai

SaiUbc / F1-League-Racing-Manager

Gold, Silver and Bronze colour for each timing + Dataset Transformation #14