PretendoNetwork / archival-tools

A collection of tools dedicated to archiving several types of data from many Nintendo WiiU and 3DS games
40 stars 6 forks source link

Future of (super mario maker) courses and Pretendo #2

Open Qiangong2 opened 7 months ago

Qiangong2 commented 7 months ago

This is a continuation of https://github.com/jonbarrow/smm1-course-archive/issues/2.

On the previous version of the script, I've been able to archive over 3 million courses (about 190G altogether). With the new script, will everything have to be archived again? Or is there compatibility with the old script?

Also, when I try and run the new script in this repo, I receive the following error:

Traceback (most recent call last):
  File "archive.py", line 121, in <module>
    async def get_buffer_queue(param: BufferQueueParam) -> list[bytes]:
TypeError: 'type' object is not subscriptable
jonbarrow commented 7 months ago

is there compatibility with the old script?

The new script lacks compatibility with the old script as it fundamentally changes how, and what, data is archived. The old script only selected a few chunks of data to save, in an effort to make the metadata file "clean looking" but as a result it threw away a lot of data which would be rather important/nice to have when making a backup. This script also supports object versioning for when DataStore objects update

The old script also only backed up course data, but lacked a LOT of maker data

The tradeoff with the extra data is that it also takes up more disk space. Right now we have a test instance running on a dedi which has downloaded 142,677 objects (objects can be courses, event courses, makers, etc. Not just courses) and currently the total disk space used, metadata included, is 4.7GB

This new script should, in theory, be much faster than the old one as well. The old script would go one by one through the entire uint64 space to look for courses, whereas this script starts at the first known good ID, and checks a configurable number of chunks of 100 IDs at a time. The server only allows up to 100 IDs to be checked per request, and by default this script tries to check 20 chunks of IDs at a time, so 2,000 IDs at once. It then downloads each objects data in parallel. However this also depends on your internet speeds and machine specs (some machines may struggle writing thousands of files at once)

Also, when I try and run the new script in this repo, I receive the following error

You need to update to Python 3.9 or later

Qiangong2 commented 7 months ago

Do you still want me to continue archiving courses? Or since you have an instance already, you have it covered now? I have the storage space to handle the extra data, just want to know if it's needed.

jonbarrow commented 7 months ago

Sure, we're still trying to figure out the logistics of ours too so having someone else making a backup just in case is always nice. I would suggest holding off though, as we are discussing some other ways to improve the archiving which may change things

I ping you here when we've settled

jonbarrow commented 7 months ago

@Qiangong2 I believe we have settled on an archiving method now. So feel free to begin your own backup now

After some rough testing, this new script was able to download around 300,000 objects from DataStore in a 12 hour period

There are currently, as of November 29th 2023, 15,651,599 objects in the Super Mario Maker DataStore server, so it should take roughly 25-30 days running non-stop to back all of them up

Do note though that "object" is not the same thing as "course". DataStore is a generic protocol used by many Nintendo games, as a way to interact with AWS S3 objects. It's basically just a fancy file upload/download protocol. An "object" is just any file inside DataStore (for instance, Animal Crossing: New Leaf uses DataStore objects for dream towns)

Super Mario Maker uses objects for many different things besides courses, including:

Super Mario Maker for the WiiU sold around 4 million copies world wide, so around 4-5 million of the 15,651,599 objects will be "maker" objects depending on things like the number of copies shared with other players, times it was pirated, etc. These maker objects will have a file size of 0, I don't know why Nintendo did it this way, so don't be alarmed by that

There is only 1 Event Course metadata file, which will be somewhat large (around 500kb), and then like 10 or so event courses (these have object IDs in the 9X0000 range)

Other objects should just be regular user made course objects

The script gz compresses all metadata files using compression level 6, to try and save space. Even so, our instance of the script has downloaded 625,278 total objects with a total of 28GB of storage used, due to the additional metadata. So storage needs have definitely increased since the old script

Also it is HIGHLY recommended that you use some kind of process manager like PM2 to run the script. The old script would check each possible DataStore object ID one by one, however this new script processes up to 100 objects at a time. This very often will cause the script to die due to trying to mass download files from S3. Also Nintendo's Super Mario Maker server seems a tad more unstable these days, so sometimes the script disconnects completely. A process manager like PM2 will automatically restart the process when it crashes

Qiangong2 commented 7 months ago

Alright, I got it running. Seems to be working so far.

Is the "objects" folder the courses? I can't read apparently image

All the other folders are the exact same size. Is that due to it indexing every single ID?

EDIT: Yeah... It's moving a lot faster than the old script lol

jonbarrow commented 7 months ago

Did you pull the latest commit? You should have a last-checked-timestamp.txt file, not a last-checked-offset.txt file. That file is from an older version of this script which had a bug that would cause it to slow down exponentially as time went on

jonbarrow commented 7 months ago

All the other folders are the exact same size. Is that due to it indexing every single ID?

The other files are fairly small, they're gzip compressed JSON data. The majority of them are less than a kilobyte in size. So yes they should generally be around the same size when viewed like that (though if you check the actual byte size the folders take up, it will be different)

Qiangong2 commented 7 months ago

Is there a way to check how many courses you have downloaded vs objects?

Qiangong2 commented 7 months ago

Was my account banned?

I get nothing but RuntimeError: PRUDP connection failed Whenever I try and run the script

jonbarrow commented 7 months ago

Very unlikely. I've been running the script for several weeks now non-stop and it's fine

We have experienced this too at times, and it usually resolves itself

Our best guess is that Nintendo has some form of temporary ratelimit in place

The longest I had this happen is a couple minutes. If you were banned, you would almost certainly see an actual ban error from the server (it supports these, I just never saw one personally)

Qiangong2 commented 7 months ago

Hmm. Alright. I still get the PRUDP error, and looking at the pm2 log, I've been getting it for the past 8 hours. I'll hold off until tomorrow and see if it lets me back in. I'm at 165GB of course data at the moment.

It's not a huge deal if I was banned, it'd just kinda suck since I've had the account for nearly a decade :/

Qiangong2 commented 7 months ago

@jonbarrow Still getting the PRUDP Connection failure error. I was able to play a Mario Kart match using the same account, so my account isn't banned. Is it possible I was IP blocked?

jonbarrow commented 7 months ago

Super Mario Maker and Mario Kart 8 should be using the same authentication server. If you were IP banned in Super Mario Maker, you would also be banned in Mario Kart 8 (assuming the ban is on the authentication server). Nintendo only uses 2 different authentication servers for all games

Try actually going online in Super Mario Maker and see what happens. I still find it doubtful that they blocked you, since our script has been running for several weeks non-stop

Qiangong2 commented 7 months ago

My server and Wii U are in different locations, which is why I assumed it might be an IP block.

Here's the error:

Packet timed out: <PRUDPPacket type=TYPE_SYN flags=NEED_ACK seq=0 frag=0>
Traceback (most recent call last):
  File "/smm1-archive/archival-tools/super-mario-maker/archive.py", line 439, in <module>
    anyio.run(main)
  File "/home/kurt/.local/lib/python3.9/site-packages/anyio/_core/_eventloop.py", line 70, in run
    return asynclib.run(func, *args, **backend_options)
  File "/home/kurt/.local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 292, in run
    return native_run(wrapper(), debug=debug)
  File "/usr/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/home/kurt/.local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 287, in wrapper
    return await func(*args)
  File "/smm1-archive/archival-tools/super-mario-maker/archive.py", line 391, in main
    async with be.login(NEX_USERNAME, NEX_PASSWORD) as client:
  File "/usr/lib/python3.9/contextlib.py", line 175, in __aenter__
    return await self.gen.__anext__()
  File "/home/kurt/.local/lib/python3.9/site-packages/nintendo/nex/backend.py", line 81, in login
    async with rmc.connect(self.settings, host, port, stream_id, context, creds, servers) as client:
  File "/usr/lib/python3.9/contextlib.py", line 175, in __aenter__
    return await self.gen.__anext__()
  File "/home/kurt/.local/lib/python3.9/site-packages/nintendo/nex/rmc.py", line 286, in connect
    async with prudp.connect(settings, host, port, vport, 10, context, credentials) as client:
  File "/usr/lib/python3.9/contextlib.py", line 175, in __aenter__
    return await self.gen.__anext__()
  File "/home/kurt/.local/lib/python3.9/site-packages/nintendo/nex/prudp.py", line 1551, in connect
    async with transport.connect(vport, type, credentials, disconnect_timeout=disconnect_timeout) as client:
  File "/usr/lib/python3.9/contextlib.py", line 175, in __aenter__
    return await self.gen.__anext__()
File "/home/kurt/.local/lib/python3.9/site-packages/nintendo/nex/prudp.py", line 1388, in connect
    await client.handshake(credentials, group)
  File "/home/kurt/.local/lib/python3.9/site-packages/nintendo/nex/prudp.py", line 813, in handshake
    raise RuntimeError("PRUDP connection failed")
RuntimeError: PRUDP connection failed

It times out on the first packet, but I'm unsure why. If it was denying the credentials, it would say that, right?

jonbarrow commented 7 months ago

My server and Wii U are in different locations

Where is the server at? And does the old script still work on that server? It's possible this server really does just have a terrible connection to Nintendo's

It times out on the first packet, but I'm unsure why. If it was denying the credentials, it would say that, right?

Yes, it should be sending a real error. Not just timing out. This isn't even making it to the authentication server

Qiangong2 commented 7 months ago

Where is the server at? And does the old script still work on that server? It's possible this server really does just have a terrible connection to Nintendo's

The old script does work. Also, the server is in Oracle Cloud with a 2 gigabit connection out. I haven't had issues before this, which is odd.

Yes, it should be sending a real error. Not just timing out. This isn't even making it to the authentication server

The last-checked-timestamp is 135299250082 if that's significant.

Qiangong2 commented 7 months ago

Actually, the script does work if I manually change the last-checked-timestamp. It just gives this continuously:

Downloading next 100 objects between 27-1-2016 11:30:34 to 27-1-2016 23:30:34
Found 1 objects
Skipping 28105732
More objects may be available, trying new offset!
Downloading next 100 objects between 27-1-2016 11:30:34 to 27-1-2016 23:30:34
Found 1 objects
Skipping 28105732
More objects may be available, trying new offset!
Downloading next 100 objects between 27-1-2016 11:30:34 to 27-1-2016 23:30:34
Found 1 objects
Skipping 28105732
More objects may be available, trying new offset!
Downloading next 100 objects between 27-1-2016 11:30:34 to 27-1-2016 23:30:34
Found 1 objects
Skipping 28105732
More objects may be available, trying new offset!
Downloading next 100 objects between 27-1-2016 11:30:34 to 27-1-2016 23:30:34
Found 1 objects
Skipping 28105732
More objects may be available, trying new offset!
Downloading next 100 objects between 27-1-2016 11:30:34 to 27-1-2016 23:30:34
Found 1 objects
Skipping 28105732

Until I ctrl-c

jonbarrow commented 7 months ago

We have released a statement about the connection issues. It's not a you thing, Nintendo seems to have fucked up https://twitter.com/PretendoNetwork/status/1736325668412031255

Qiangong2 commented 7 months ago

Ah, that makes sense. At least I know I'm not crazy :D

Is there a way in the script to force it to always connect to the same IP?

Qiangong2 commented 6 months ago

The script seems to be stuck downloading the same course over and over ever since I changed the last-changed-timestamp manually. Is there a way to get the script back on track (besides deleting everything and starting from scratch)?

cheater commented 4 months ago

hi @Qiangong2 and @jonbarrow, I wanted to see if you guys were still running the backups as the servers are getting to EOL.

jonbarrow commented 4 months ago

hi @Qiangong2 and @jonbarrow, I wanted to see if you guys were still running the backups as the servers are getting to EOL.

We announced via Twitter several months ago that our scan had finished and we have a full backup