PretendoNetwork / archival-tools

A collection of tools dedicated to archiving several types of data from many Nintendo WiiU and 3DS games
40 stars 6 forks source link

Add theme shop scraper #21

Closed MisterSheeple closed 4 months ago

MisterSheeple commented 4 months ago

Btw, something cool about this script is that it also dumps HTTP headers. That might be handy to have in the boss scraper too.

jonbarrow commented 4 months ago

The BOSS headers aren't that interesting, and we have tons of examples of them anyway. The response headers are just basic HTTP headers, and the few custom ones that exist aren't that interesting

The only time there's really a unique header is in the request headers. But that's not really relevant here as those aren't being used here, and they're also not that interesting anyway

BOSS uses the following request headers when uploading data:

And then the only interesting response header for uploads is X-Boss-Request-Summary, but that has a super basic format that's really easy to understand (it's just some space delimited data about the uploaded)

Every BOSS request, even GET requests, also format the User-Agent to have identifying information, but we have this mostly documented already here https://github.com/PretendoNetwork/BOSS/blob/master/src/middleware/parse-user-agent.ts

Besides that the headers aren't really that useful or interesting?

jonbarrow commented 4 months ago

Also I don't see anywhere in this script where the BOSS keys are supplied

MisterSheeple commented 4 months ago

Besides that the headers aren't really that useful or interesting?

The main thing I found interesting is the modified file dates. That could be helpful in some ways.

Also I don't see anywhere in this script where the BOSS keys are supplied

By key, are you referring to the task ID?

MisterSheeple commented 4 months ago

I've just been told that the decryption process is all handled automatically by pyctr, which this tool uses.

MisterSheeple commented 4 months ago

Should be ready to merge now methinks.

jonbarrow commented 4 months ago

The main thing I found interesting is the modified file dates. That could be helpful in some ways.

I don't see how that could really be helpful. If you think it's interesting, that's one thing, but helpful is another

By key, are you referring to the task ID?

No, the crypto keys

I've just been told that the decryption process is all handled automatically by pyctr, which this tool uses.

I looked inside that library and didn't see where those keys were defined. I only saw some places where it generates keys. If that library contains, or generates, those keys I don't think we'd feel comfortable having this in our repository. Especially with the way Nintendo has been cracking down lately, we have all taken extra steps to not step on their toes and I won't put that at risk now

EDIT: I brought this up in our developer channels, and @SuperMarioDaBom pointed out that BOOT9_PATH is required which may be where the keys come from. But taking a look, this doesn't seem to be used anywhere? And it's not explained why this is needed?

SuperMarioDaBom commented 4 months ago

The PyCTR library uses that environment variable, and is the one that opens/parses the file. Keys are indeed provided by the user, so this should be safe.

jonbarrow commented 4 months ago

Perfect 👍 Thanks for the clarification

MisterSheeple commented 4 months ago

I don't see how that could really be helpful. If you think it's interesting, that's one thing, but helpful is another

It's helpful for when we don't know when a certain event happened so that it can be dated to a certain period of time. At least, that's what my line of thought was.

If that library contains, or generates, those keys I don't think we'd feel comfortable having this in our repository. Especially with the way Nintendo has been cracking down lately, we have all taken extra steps to not step on their toes and I won't put that at risk now

Then why do you include the Wii U common key in this repo lol

jonbarrow commented 4 months ago

It's helpful for when we don't know when a certain event happened so that it can be dated to a certain period of time. At least, that's what my line of thought was.

Again, that's more of a personal interest than being objectively helpful/useful, at least for our needs. Which again, is fine. Just not something I'm worried about bringing over to the other tools

Then why do you include the Wii U common key in this repo lol

It isn't? Or shouldn't be? There's nothing in this repo that would use the common key, that's for title decryption and nothing in here decrypts titles

jonbarrow commented 4 months ago

The Wii U common key is nowhere in any repo in the whole organization. So I'm not too sure what you mean here

Screenshot from 2024-03-15 17-11-14-censored

MisterSheeple commented 4 months ago

Again, that's more of a personal interest than being objectively helpful/useful, at least for our needs

Even if it's not useful to you, it's certainly useful to others who can analyze the data and see when the events were meant to be. Moreover, I don't really see why this shouldn't be collected anyway because it builds more context that could be useful to someone. If none of the header data had a purpose at all then I'd understand, but here's a piece of data that can tell you exactly when an event is from, which could be useful for reimplementing the service if you wish to have cyclical events per year. So I hope you reconsider.

The Wii U common key is nowhere in any repo in the whole organization. So I'm not too sure what you mean here

What's this then?

https://github.com/PretendoNetwork/archival-tools/blob/master/spotpass%2Fcerts%2Fwiiu-common.key

jonbarrow commented 4 months ago

Even if it's not useful to you, it's certainly useful to others who can analyze the data and see when the events were meant to be. Moreover, I don't really see why this shouldn't be collected anyway because it builds more context that could be useful to someone. If none of the header data had a purpose at all then I'd understand, but here's a piece of data that can tell you exactly when an event is from, which could be useful for reimplementing the service if you wish to have cyclical events per year. So I hope you reconsider.

It doesn't show "when events were meant to be". That header is used for cache control. You seem to have this idea it was used for something related to files being uploaded in some kind of cycle. Even if you wanted to do something like that, that header does not actually give you any of that information. The only purpose of the header is to be stored by the client, and sent in future requests as the If-Modified-Since header, so that the server can send a 304 and not waste bandwidth resending a file the client already has cached.

The actual date in the header isn't useful for any other purpose (it's not even that useful for cache control either, etags are much better), especially not the purposes you're laying out. Knowing when a file was last modified doesn't tell you anything about a "cycle" of changes.

The eShop background music file reports it was last modified 4 years ago, but that doesn't tell you anything about when/if it would ever be modified in the future as some sort of cycle. Even in the hypothetical situation you presented, this header wouldn't be useful.

These tools in this repo, when combined, add up to terabytes of data. Our Super Mario Maker archive is 600GB alone. Our current cloud backed up instance of the SpotPass data from this repo (which has not been backed up in quite a while) is already up to 300GB. So, yes, I am opting to not store as much unnecesary data as possible. It adds up very quickly.

master/spotpass%2Fcerts%2Fwiiu-common.key

That's not the Wii U Common Key. That's for the client certificate, to make SSL requests. VERY different things. The Wii U Common Key is used to decrypt title contents, and is no way shape or form present here. We do not provide any keys to decrypt any contents in our repos.

jonbarrow commented 4 months ago

Looks good to me :+1:

Thanks again