Closed lxndrblz closed 2 years ago
Hi there,
Thanks for this - yes the multiple iterations are slow and were mostly there as an easier way to keep my thoughts straight while I was understanding the format - I've actually already made a similar optimisation to my .NET version of this library. I'll take a proper look over the next few days and hopefully just pull as-is.
Thanks Again,
Alex
Hi Alex,
Thanks again for your amazing work and for making this library available as an open-source product.
As part of my work on a forensic parser for Microsoft Teams, I have noticed that the way the metadata is fetched for the IndexedDB could be improved. The fact that
iterate_records_raw()
is called three times, even though the database did not change in the meantime, makes it quite slow. By unifying the collection of the metadata, I was able to significantly reduce the time needed to loop through a large database.Benchmark
As a benchmark I was looping over the following IndexedDB (contains several object stores and records) using the included benchmark.py script.
https://github.com/lxndrblz/forensicsim/tree/main/testdata/John%20Doe/IndexedDB/https_teams.microsoft.com_0.indexeddb.leveldb
I got 432 seconds before the optimisation and 265 seconds after my optimisation.
Let me know what you think.
Alex