cclgroupltd / ccl_chromium_reader

(Sometimes partial) Python re-implementations of the technologies involved in reading various data sources in Chrome-esque applications.
MIT License
134 stars 34 forks source link

Error 22 Invalid Argument when '.ldb' file is zero length. #6

Closed ThomasAJ05 closed 3 years ago

ThomasAJ05 commented 3 years ago

Firstly thank you for the work you put in here - it is huge.

In ccl_leveldb.py > class LdbFile > self._f.seek(-LdbFile.FOOTER_SIZE, os.SEEK_END) the error FIRST occurs.

I added a check for file size here. ccl_leveldb.py > class RawLevelDb > self._files.append(LdbFile(file))

I was hoping to use your work to extract key/value pairs from a website from the .ldb files but it seems the files are not updated in real-time.

Using Chrome Dev tools (Local Storage > desired website), .ldb last changed dates, and value of the extracted pairs (and source files thereof) from your dump_leveldb.py I can see that the key/value pairs are from several hours ago.

Your fine article (https://www.cclsolutionsgroup.com/post/hang-on-thats-not-sqlite-chrome-electron-and-leveldb) implies that Chrome updates the files in leveldb in real-time, if I understood it correctly.

So it seems Chrome keeps things in memory before writing to disk???

cclgroupltd commented 3 years ago

Hi Thomas, I will take a look at pushing a fix soon- I'm actually doing some work in and around that module at the moment as it happens.

Whether stuff gets pushed to the leveldb in real-time or buffered will depend on the actual use of the database. - either way, remember that records always go to a ".log" file first, so it wouldn't be surprising to see that an .ldb modified date isn't changing.

Either way, it sounds as if you're looking at local storage? I might have something for you in the next 7 days or so that may provide a little more controlled way of accessing that information and some more detail on how those data stores work (as it happens Local Storage does indeed buffer transactions in memory before committing them to the data store).

ThomasAJ05 commented 3 years ago

I look forward to any updates.

You said "records always go to a ".log" file first". Yes, I understood that from your article so I was surprised to see its size as zero when some key/values pairs as seen via Chrome dev tools were different from what your s/w extracted.

Sorry, I forgot to mention .log file/s in my post.

Looking at your website I am assuming this software could be used for forensic work. However, my humble goal is to extract the latest auth-token (it changes every now and then) so I can use a 'website's API' to get data that I subscribe to that is unfortunately not fully provided by the 'subscribed to API'. ie the 'subscribed to API' chops data off when it gets to a certain (small) size. The website shows the data fully. A simple fix is required by this company that they cannot be bothered doing.

Why am I boring you with this? Well, I'm hoping your changes might enable the extraction of ALL data relating to local storage.