Aedial / novelai-api

Python API for the NovelAI REST API
https://aedial.github.io/novelai-api/
MIT License
129 stars 17 forks source link

NovelAi MessagePack unpacking for the "document" field of the StoryContent metadata #32

Closed JasonLWalker closed 4 months ago

JasonLWalker commented 4 months ago

This module contains a python port of the MIT licensed MsgPackr v1.10.0 javascript library by Kris Zyp. https://github.com/kriszyp/msgpackr

Only the Unpacker modules have been implemented, and there are a few portions of the library that don't seem to be used by NovelAi that were not ported yet. These portions of the code will raise NotImplementedError exceptions if they are reached, but I was unable to get any of the story content I tested to hit that code.

The PR also includes implementations for the additional NovelAI specific extensions 20, 30, 31, 40, 41 and 42 (mostly as just passthrough implementations.)

An extension helper for remapping floating point keys and values to strip off the trailing zeros, and a JSON serializer that will both remap the keys and serialize DateTime values in a normalized UTC format are also included.

An example of how to use the new module is included in the example directory, and two new tests have been added that test the base64 encoded values against their original JSON source. (Both the JSON and base64 values are loaded from sanity text sets to simulate loading them from the "document" field.)

I have not yet implemented decoding the "document" field in the main novelai-api low or high files, but this should provide everything needed to do so.

My Python is a bit rusty, and I know that the module needs additional error handling and type checking, but I wanted to go ahead and commit this code so it was available for others if they wanted it.

I also have not yet implemented the "Packer" portion of the MessagePack module.

resolves: #31

Aedial commented 4 months ago

Thank you. I'll look at it today or tomorrow and test it.

JasonLWalker commented 4 months ago

One thing to note. Though it is possible to reuse the Unpacker object after it has been used to unpack a document, I don't recommend doing so since it will attempt to save it's current state, including the entire byte array of it's read buffer. I recommend instantiating a new unpacker instance each time, and garbage collecting a used one via "del" as soon as you are done with it.

JasonLWalker commented 4 months ago

Also, as I said, my python is about 10 years rusty, so please refactor anything you need to to make it more efficient and/or useful. However, I did intentionally implement the base MsgPackr library to be very close to the original Javascript library so that it would be easier to identify and update the code if Kris Zyp updates the original, and NovelAI implements his changes.

Aedial commented 4 months ago

So, looking further into it, it seems the msgpack-python library wasn't compatible due to 2 non-standard modifications (records and bundled strings). I really don't know why they weren't made with extensions. I think I will end up re-implementing msgpack considering these custom extensions (as the specs are straight forward), to get rid of the code spaghetti, the weird JS-oddities and to properly type everything. Of course, a mention to your port will be included as a strong inspiration.

JasonLWalker commented 4 months ago

Yes, I tried implementing everything using the official msgpack library originally, and it was a mess due to the lack of ability to access the buffer or current pointer position since these cython attributes weren't exposed for use in extensions. (As well as an odd tendency for the offset pointer to get out of sequence when inside a custom extension, and then overcorrect when leaving the extension) I finally gave up and implemented it from scratch. Using Kris Zip's implementation, since it seems to be what NovelAi is using.

Aedial commented 4 months ago

Done with 9ab194a