ValvePython / vdf

📜 Package for working with Valve's text and binary KeyValue format
https://pypi.org/project/vdf/
MIT License
167 stars 31 forks source link

Implement functions for working with binary keyvalues #3

Closed rossengeorgiev closed 8 years ago

rossengeorgiev commented 8 years ago

Those are used in richpresence and other places, would be useful for the steam module eventually.

More info: https://github.com/SteamRE/SteamKit/issues/108

SleepProgger commented 8 years ago

Also used in the PICSProductInfoResponse for the package data.
I'll port the version pysteamkit uses and submit a pull request in a bit.

SleepProgger commented 8 years ago

I finished the first version of the parse_binary function.
826a5d2234c2a691ef2a7d9ea6b545f12fa3a6e9 It works so far for every package data i tested it with. Some test data (base64 encoded) at http://pastebin.com/mf16sHQh . Let me know what you think.

Some notes:

Additionally, i am not sure if creating the load_binary and loads_binary functions or just supplying an optional binary parameter to load/s would be better..

Edit: Added experimental version of dump_binary. b1f96ab700e7486b91a6c0dbe142ecf11ed5d238 It needs the scalars in the form (data_type, value). IMHO, that is a bit strange, but we NEED to know as which type to save the data (esp numeric values). Additionally there is en/decoding missing, plus the same types as in the parse_binary function are missing (BIN_POINTER, BIN_WIDESTRING, BIN_COLOR).

rossengeorgiev commented 8 years ago

Let's just deal with bytes strings. Then we can use indexing and slicing to parse it. I suggest find method for slicing out bin_string.

>>> x = b'bbb\x00aaaa\x00'
>>> x[0: x.find(b'\x00', 0)]
b'bbb'
>>> x[4: x.find(b'\x00', 4)]
b'aaaa'

If we want to make load/dump symmetric then the only way is to use objects for point, widestring, and color.

rossengeorgiev commented 8 years ago

I think that covers everything. I haven't tested it on any real data yet.

SleepProgger commented 8 years ago

I am almost finished with my version, too. IMHO it is a bit cleaner. https://github.com/SleepProgger/vdf/blob/parse_binary_impl/vdf/vdf_binary.py

Changes:

So far i tested it with all the packages i could get from steam, and deserializeing them and serializing them again returned the correct results. The problem is, so far all the data i tested was of the types String and int32. I am looking for other testdata (especially with widestrings in it).

If you want i could merge your changes with mine ?

rossengeorgiev commented 8 years ago

Did you see my implementation? It's feature complete and fully tested.

Here is quick perf comparison.

In [1]: import vdf, vdf_binary
In [2]: raw = vdf.binary_dumps(dict(map(lambda x: (str(x), x),
range(5000))))
In [3]: %timeit vdf.binary_loads(raw)
100 loops, best of 3: 15.1 ms per loop
In [4]: %timeit vdf_binary.parse(raw)
10 loops, best of 3: 54.1 ms per loop
In [5]: raw = vdf.binary_dumps(dict(map(lambda x: (str(x), str(x)*100),
range(5000))))
In [6]: %timeit vdf.binary_loads(raw)
100 loops, best of 3: 15 ms per loop
In [7]: %timeit vdf_binary.parse(raw)
10 loops, best of 3: 103 ms per loop
SleepProgger commented 8 years ago

Well, your parsing version doesn't support streams (and therefore is ofc faster with memory data ;)), but i can live with that. After rereading your code, they are pretty similar in most cases.

Some stuff i noticed:

SleepProgger commented 8 years ago

Good news everybody, I tested your implementation against all current existing productInfo i could get (about 3000 objects), and everything went well, and all data is correctly parsed and dumped ( except the minor issue mentioned in #5 ). All the test data can be found here. I will update that repo with other files (from richpresence and other sources which may exist)

Every single binary vdf blob i tested only contained int32 and string types btw.