cclgroupltd / ccl_chromium_reader

(Sometimes partial) Python re-implementations of the technologies involved in reading various data sources in Chrome-esque applications.
MIT License
134 stars 34 forks source link

feat: improve metadata collection of IndexedDB #11

Closed lxndrblz closed 2 years ago

lxndrblz commented 2 years ago

Hi Alex,

Thanks again for your amazing work and for making this library available as an open-source product.

As part of my work on a forensic parser for Microsoft Teams, I have noticed that the way the metadata is fetched for the IndexedDB could be improved. The fact that iterate_records_raw() is called three times, even though the database did not change in the meantime, makes it quite slow. By unifying the collection of the metadata, I was able to significantly reduce the time needed to loop through a large database.

Benchmark

As a benchmark I was looping over the following IndexedDB (contains several object stores and records) using the included benchmark.py script.

https://github.com/lxndrblz/forensicsim/tree/main/testdata/John%20Doe/IndexedDB/https_teams.microsoft.com_0.indexeddb.leveldb

I got 432 seconds before the optimisation and 265 seconds after my optimisation.

Let me know what you think.

Alex

cclgroupltd commented 2 years ago

Hi there,

Thanks for this - yes the multiple iterations are slow and were mostly there as an easier way to keep my thoughts straight while I was understanding the format - I've actually already made a similar optimisation to my .NET version of this library. I'll take a proper look over the next few days and hopefully just pull as-is.

Thanks Again,

Alex