Closed joliveira98 closed 1 year ago
Hi there,
I'll need to dive back into the Chrome source to work out what may be happening/if it's allowed to happen. If it's not sensitive, is it possible to share the blob of data which causes the error? Is it every value in the leveldb or do you get through some first? Is the database representative of recent data, or is it a little older?
Hello,
If it's not sensitive, is it possible to share the blob of data which causes the error? I prefer not to share the blob because the data is from a Zoom meeting in the browser so I'm not sure what type of information can be stored there regarding my Zoom account. In order to reproduce the leveldb and blob you just need to host a Zoom meeting on the browser and the leveldb and blob will be created. I believe this behaviour will happen to every leveldb and blob related to Zoom web meetings.
Is it every value in the leveldb or do you get through some first? Every value in the leveldb for Zoom raises this exception.
Is the database representative of recent data, or is it a little older? It is recent. The data was created the day I tested it. 2 days ago.
I've tested the script with twitter and Google Drive related blobs and it worked fine. Also those blobs are recent, December 10.
Great, thank you for the details. I will get back to you when I've had a chance to have a look at some data.
Same issue. With print
added in _read_header
:
def _read_header(self) -> int:
tag = self._read_tag()
print('tag/token_kVersion', tag, Constants.token_kVersion)
if tag != Constants.token_kVersion:
raise ValueError("Didn't get version tag in the header")
version = self._read_le_varint()[0]
return version
The output ends up being:
<WrappedDatabase: id=1; name=somedb; origin=https_someweb_0@1>
<WrappedObjectStore: object_store_id=1; name=somestore>
tag/token_kVersion b'\xff' b'\xff'
tag/token_kVersion b'\xff' b'\xff'
tag/token_kVersion b'\xff' b'\xff'
tag/token_kVersion b'\xff' b'\xff'
tag/token_kVersion b'\x01' b'\xff'
Traceback (most recent call last):
for record in store.iterate_records():
yield from self._raw_db.iterate_records(
deserializer = ccl_v8_value_deserializer.Deserializer(
self.version = self._read_header()
raise ValueError("Didn't get version tag in the header")
ValueError: Didn't get version tag in the header
Hi,
Are you able to share the data that raised this error? If you comment out the if/raise lines in there, does the code proceed as expected?
Hi,
Thank you for putting so much effort into developing such a tool and open sourcing it.
I am working on recovering Proton Mail messages from cache and ran into the same issue.
If I comment out the
raise ValueError("Didn't get version tag in the header")
another exception raises at line 600 of ccl_v8_value_deserializer.py:
if func is None:
raise ValueError(f"Unknown tag {tag}")
Traceback (most recent call last):
File "/root/ccl_chrome_indexeddb/raw.py", line 31, in <module>
for record in db.iterate_records(db_id_meta.dbid_no, obj_store_id):
File "/root/ccl_chrome_indexeddb/ccl_chromium_indexeddb.py", line 564, in iterate_records
value = deserializer.read()
File "/root/ccl_chrome_indexeddb/ccl_v8_value_deserializer.py", line 627, in read
return self._read_object()
File "/root/ccl_chrome_indexeddb/ccl_v8_value_deserializer.py", line 611, in _read_object
tag, o = self._read_object_internal()
File "/root/ccl_chrome_indexeddb/ccl_v8_value_deserializer.py", line 600, in _read_object_internal
raise ValueError(f"Unknown tag {tag}")
ValueError: Unknown tag b'\xff'
I am willing to share the dataset with you privately if it helps, just let me know the email where I can send it.
Hi there,
So yes, if you'd like to share the data that might help and you can do so on alex[dot]caithness[at]cclsolutionsgroup[dot]com - it may be best to share it to dropbox or similar so that it doesn't get eaten by filters.
There may be things already in the code to help though - if you check out the code here: https://github.com/cclgroupltd/ccl_chrome_indexeddb#wrapper-api and in particular:
for record in obj_store.iterate_records(
errors_to_stdout=True,
bad_deserializer_data_handler= lambda k,v: print(f"error: {k}, {v}")):
print(record.user_key)
print(record.value)
There is a way of calling the iterate_records
function which can include a function callback to handle errors (or print them to stdout instead of raising them) - if the record that is causing the error is malformed, this would be the way to deal with it.
Let me know how you get on.
Hello,
The record structure changed in newer Blink versions. First, the IDB value wrapping, see at https://chromium.googlesource.com/chromium/src/+/refs/heads/main/third_party/blink/renderer/modules/indexeddb/idb_value_wrapping.cc The wrapping detection logic in IDBValueUnwrapper::IsWrapped() must be able to distinguish between SSV byte sequences produced and byte sequences expressing the fact that an IDBValue has been wrapped and requires post-processing. SSV processing command replacing the SSV data bytes with a Blob's contents. 1) 0xFF - kVersionTag 2) 0x11 - kRequiresProcessingSSVPseudoVersion 3) 0x01 - kReplaceWithBlob 4) varint - Blob size 5) varint - the offset of the SSV-wrapping Blob in the IDBValue list of Blobs
The python code expects a version tag in position 3 which is replaced by 0x01 in this case.
The other change in the Blink envelope, https://github.com/chromium/chromium/blob/main/third_party/blink/renderer/bindings/core/v8/serialization/v8_script_value_deserializer.cc
// These versions expect a trailer offset in the envelope.
if (version >= TrailerReader::kMinWireFormatVersion) {
static constexpr size_t kTrailerOffsetDataSize = 1 + sizeof(uint64_t) + sizeof(uint32_t);
So in iterate_records now should be something like
require_processing = record.value[val_idx]
if require_processing == 0x01:
val_idx += 1
blob_size, varint_raw = _le_varint_from_bytes(record.value[val_idx:])
val_idx += len(varint_raw)
blob_offset, varint_raw = _le_varint_from_bytes(record.value[val_idx:])
val_idx += len(varint_raw)
# trailer offset
if blink_version >= 21:
val_idx += 1+8+4 # 1 + uint_64_t + uint_32_t
@dg-data thanks for highlighting this - it has highlighted some other changes in recent versions which I also need to address, so I'll be jumping on that as soon as I have a chance.
@dg-data a little more context in case you're interested - on the blob wrapping side of things, it looks like this happens if the serialized data exceeds kIDBWrapThreshold which is 65536 (at the moment). In that case the serialized data is referenced in a blob entry for that key - functionally I think that means that it's in a separate file on disk, but I'm putting together test data at the moment to check exactly what is going on...
Edit: confirmed, that's exactly what's going on. that complicates things a bit, but a lot of the groundwork is already in the code so it's not too bad.
@cclgroupltd Hi! Thanks for your quick reaction. The basic logic stayed untouched in Blink I think, class IndexedDBExternalObject and blob handling looks good. The route to the data comes from records with a key where index ID is 3 – the “external object table” and the blob info in it. Deserialization of that blob should work,
@dg-data yep, the logic for looking up the external data is already in our module because it's how "File objects" are accessed. It shouldn't be too tricky to plumb that all in.
@dg-data could you give the most recent commit a go if you have suitable test data? It is working with my test data, but if you have real-world data to run it against that would be useful!
@cclgroupltd Thanks for the update. I checked the most recent version and found no mistakes. I tested various records (Blink v.17, 20, 21) including the ones with externally serialized objects. As far as I see it looks pretty good, at least solved the issue I had. Great work, saved my data!
Fantastic. I'll close this issue now then, thanks for you help!
Hello,
I've been trying to retrieve the key/value information from a specific leveldb on Chrome IndexedDB and I keep getting a ValueError exception.
This happens on the records iteration and it crashes on the following verification:
Apparently my version tag value is 0x01 and it should be 0xff.
This only appears with one specific leveldb. Do you know why is the version tag with this value? Shouldn't it be the 0xff instead of 0x01