Closed Badaro closed 1 year ago
While I'd like to make melee.gg scraping automated, there's some annoying complexities in there:
This ticket is my rough draft on how to attempt this automation.
[x] Rewrite the code for melee.gg, splitting it into:
Client - which should contain the code to access/download information from melee.gg
Scraper - which should use the Client and simply normalize the data to the appropriate format
[x] Implement an analyzer capable of analyzing a tournament using the Client implemented above and logic to handle all know edge cases:
Ensure the tournament name contains the format
Ensure the tournament name does not contain a different format name (the "Legacy European Tour") issue
Handle team tournaments with a single format (https://melee.gg/Tournament/View/17900)
Handle team tournaments with different formats
Handle Pro-Tour tournaments (might require a very specific hack)
Ignore empty tournaments (https://melee.gg/Tournament/View/31590)
Ignore tournaments with 80% decklists missing (https://melee.gg/Tournament/View/24649)
[x] Run a test with all tournaments currently in the DB to ensure it's generating the correct flags for all of them
[x] Implement support for listing tournaments on the new Client
[x] Implement a scraper going using the Client to list tournaments
[x] Run a process to redownload tournaments from the beginning using the new model
WIP on branch melee-gg-automation
While I'd like to make melee.gg scraping automated, there's some annoying complexities in there:
This ticket is my rough draft on how to attempt this automation.
[x] Rewrite the code for melee.gg, splitting it into:
Client - which should contain the code to access/download information from melee.gg
Scraper - which should use the Client and simply normalize the data to the appropriate format
[x] Implement an analyzer capable of analyzing a tournament using the Client implemented above and logic to handle all know edge cases:
Ensure the tournament name contains the format
Ensure the tournament name does not contain a different format name (the "Legacy European Tour") issue
Handle team tournaments with a single format (https://melee.gg/Tournament/View/17900)
Handle team tournaments with different formats
Handle Pro-Tour tournaments (might require a very specific hack)
Ignore empty tournaments (https://melee.gg/Tournament/View/31590)
Ignore tournaments with 80% decklists missing (https://melee.gg/Tournament/View/24649)
[x] Run a test with all tournaments currently in the DB to ensure it's generating the correct flags for all of them
[x] Implement support for listing tournaments on the new Client
[x] Implement a scraper going using the Client to list tournaments
[x] Run a process to redownload tournaments from the beginning using the new model