ControlNet / wt-data-project.data

Data collected in wt-data-project.
https://wt.controlnet.space
GNU Affero General Public License v3.0
42 stars 4 forks source link

Bypass Thunderskill #3

Open Bearddyy opened 11 months ago

Bearddyy commented 11 months ago

As there is a lot of complaints about the nature of the data in thunderskill being selective and not representative of the actual general performance.
Would you be interested in bypassing thunderskill and collecting the data directly?
This way all games could be parsed and we could avoid arguments against the validity of the data.

ControlNet commented 11 months ago

Hi Bearddyy, is there any way can do that legally?

Bearddyy commented 11 months ago

@ControlNet We can extract scores etc from replay files downloaded from their website,
No less legal than doing the same thing from thunderskill?

axiangcoding commented 10 months ago

Kind of interesting on this. any update? My opinion is find a way to extract data from original API, for example, in game client, we can see someone's profile, it's data must come from somewhere. I once obtained the real API url of the game data through packet capture, but I don't know how to use it

ControlNet commented 10 months ago

@axiangcoding Actually, there is someone having tried that, but it's actually not feasible here. If you want to try downloading all the replay files from WT's official replay website, you need around 2~4 gbits per second to download it. And they will ban the ip if downloading too much. So it's not a good way.

From that person's analysis, the data from the official replay website and the data collected from thunderskill strongly correlated, so currently the data is still fine for some analysis.

axiangcoding commented 10 months ago

@axiangcoding Actually, there is someone having tried that, but it's actually not feasible here. If you want to try downloading all the replay files from WT's official replay website, you need around 2~4 gbits per second to download it. And they will ban the ip if downloading too much. So it's not a good way.

From that person's analysis, the data from the official replay website and the data collected from thunderskill strongly correlated, so currently the data is still fine for some analysis.

i believe you reply to the wrong guy...

ControlNet commented 10 months ago

@axiangcoding Actually, there is someone having tried that, but it's actually not feasible here. If you want to try downloading all the replay files from WT's official replay website, you need around 2~4 gbits per second to download it. And they will ban the ip if downloading too much. So it's not a good way. From that person's analysis, the data from the official replay website and the data collected from thunderskill strongly correlated, so currently the data is still fine for some analysis.

i believe you reply to the wrong guy...

No... I guess that person who contacted me via email is Breaddyy, so I share the information here to let you know.

axiangcoding commented 10 months ago

@ControlNet okey then. I remember that the replay file is binary or encrypted. Is there a way to decrypt it now?

ControlNet commented 10 months ago

@axiangcoding I see the Bearddyy's repository can handle it. Please have a look https://github.com/Bearddyy/wtparser

axiangcoding commented 10 months ago

@axiangcoding I see the Bearddyy's repository can handle it. Please have a look https://github.com/Bearddyy/wtparser

Thanks. He really make some progress on this

llama-for3ver commented 10 months ago

Kind of interesting on this. any update? My opinion is find a way to extract data from original API, for example, in game client, we can see someone's profile, it's data must come from somewhere. I once obtained the real API url of the game data through packet capture, but I don't know how to use it

I have found a way to get full player data without even needing an auth header, buuutttt it uses protobuf, and I need to transform compiled definitions to a file to be able to use it.

I've still found some very useful endpoints through, such as searching for player names (I've also scraped for them too) and fetching news.

axiangcoding commented 10 months ago

I have found a way to get full player data without even needing an auth header, buuutttt it uses protobuf, and I need to transform compiled definitions to a file to be able to use it.

Mind if share what is and how to use that API? I used tried to capture network packet, but it's a cdn url base on AWS, not sure i can use it.

I've still found some very useful endpoints through, such as searching for player names (I've also scraped for them too, I'll add the link soon) and fetching news.

Looking forward to see those links!

llama-for3ver commented 10 months ago

@axiangcoding It's from the assistant

I'll make a public postman workspace and link it o7

axiangcoding commented 10 months ago

@RaidFourms Thanks advance for sharing! It helps a lot.

ControlNet commented 10 months ago

Thanks for sharing. Looking forward to your works.

Bearddyy commented 10 months ago

@axiangcoding Actually, there is someone having tried that, but it's actually not feasible here. If you want to try downloading all the replay files from WT's official replay website, you need around 2~4 gbits per second to download it. And they will ban the ip if downloading too much. So it's not a good way. From that person's analysis, the data from the official replay website and the data collected from thunderskill strongly correlated, so currently the data is still fine for some analysis.

i believe you reply to the wrong guy...

No... I guess that person who contacted me via email is Breaddyy, so I share the information here to let you know.

I didn't email you, must have been someone else. As for data rate limit, it's potentially able to be circumvented by distribution of scripts to VMs as each processes less. Also I have found each replay has 2 types of files that alternate so I suspect the data rate is further reduced. But again, could just be blocked by gaining, would need a fair amount of automation.

llama-for3ver commented 10 months ago

@axiangcoding Actually, there is someone having tried that, but it's actually not feasible here. If you want to try downloading all the replay files from WT's official replay website, you need around 2~4 gbits per second to download it. And they will ban the ip if downloading too much. So it's not a good way. From that person's analysis, the data from the official replay website and the data collected from thunderskill strongly correlated, so currently the data is still fine for some analysis.

i believe you reply to the wrong guy...

No... I guess that person who contacted me via email is Breaddyy, so I share the information here to let you know.

I didn't email you, must have been someone else. As for data rate limit, it's potentially able to be circumvented by distribution of scripts to VMs as each processes less. Also I have found each replay has 2 types of files that alternate so I suspect the data rate is further reduced. But again, could just be blocked by gaining, would need a fair amount of automation.

I would imagine proxies would be much more efficient

llama-for3ver commented 10 months ago

Also here is the username/userid scraper. Very inefficient through 😭

Bearddyy commented 10 months ago

Also here is the username/userid scraper. Very inefficient through 😭

Thanks for this, I didn't even know the app existed. I think there's potentially a fair amount of data that could be scrapped from that endpoint, but the specific data I was interested in, like specific vehicle or national performance doesn't appear as if it would be available from navigating around the app. It looks like it's similar data to the user page.

llama-for3ver commented 10 months ago

Also here is the username/userid scraper. Very inefficient through 😭

Thanks for this, I didn't even know the app existed. I think there's potentially a fair amount of data that could be scrapped from that endpoint, but the specific data I was interested in, like specific vehicle or national performance doesn't appear as if it would be available from navigating around the app. It looks like it's similar data to the user page.

wdym?

llama-for3ver commented 10 months ago

@axiangcoding Here is some API stuff https://www.postman.com/llama-for3ver/workspace/wta-public/documentation/31045545-13f0ca8f-cc11-423f-9ed1-53000afa06fb

llama-for3ver commented 10 months ago

@axiangcoding Any update on it yet?

axiangcoding commented 10 months ago

@axiangcoding Any update on it yet?

Sorry, I'm not very familiar with protobuf, I need some time to try it out. I don’t have much time recently, I will share with you any progress

llama-for3ver commented 10 months ago

@axiangcoding Any update on it yet?

Sorry, I'm not very familiar with protobuf, I need some time to try it out. I don’t have much time recently, I will share with you any progress

I don't know much either lol

i'll still keep you updated o7

axiangcoding commented 10 months ago

hi guys, i have started a repo https://github.com/axiangcoding/wt-profile-tool to create a wt profile parse library base on @RaidFourms's information and help. But for now this is just the beginning.

For the player vehicle data, I think I've been able to parse it out, in the near future.

Thank you all for sharing the information, it will greatly advance this work.


In fact, I am good at web server development. I know nothing about parsing apk and stuff like that. Special thanks to @RaidFourms's hard work on this.