cclgroupltd / ccl_chromium_reader

(Sometimes partial) Python re-implementations of the technologies involved in reading various data sources in Chrome-esque applications.
MIT License
134 stars 34 forks source link

Question regarding object decoding #9

Closed lxndrblz closed 3 years ago

lxndrblz commented 3 years ago

Hi,

Thanks for your efforts in developing this code and your blog posts! It is much appreciated.

I am using your code in one of my forensics projects for extracting conversation artefacts from an Electron-based communication platform. While the enumeration works incredible reliable, I have the impression that the record.value are not fully decoded.

My response looks like this right now:

b'!\xff\x13\xff\ro"\x0econversationId"[19:54dd27a7-fbb0-4bf0-8208-a4b31a578a3f_e62b7cec-7379-4d6f-aed7-24b48be68a74@unq.gbl.spaces"\x0fparentMessageId"\r1622368092916"\x08messageso"@4235357803446472000,8:orgid:e62b7cec-7379-4d6f-aed7-24b48be68a74o"\x0bmessagetype"\rRichText/Html"\x0bcontenttype"\x04text"\x07content"6<div>To Sherlock Holmes she is always the woman.</div>"\rrenderContent"6<div>To Sherlock Holmes she is always the woman.</div>"\x0fclientmessageid"\x134235357803446472000"\ramsreferencesA\x00$\x00\x00"\rimdisplayname"\x08Jane Doe"\npropertieso{\x00"\x02id"\r1622368092916"\x04type"\x07Message"\nsequenceIdI\x04"\x0bmessageKind"\x11skypeMessageLocal"\x0bcomposetime"\x1c2021-05-30T09:48:12.9160000Z"\x13originalarrivaltime"\x1c2021-05-30T09:48:12.9160000Z"\x11clientArrivalTime"\x182021-05-30T10:08:31.218Z"\x10conversationLink"\x9b\x01https://uk.ng.msg.teams.microsoft.com/v1/users/ME/conversations/19:54dd27a7-fbb0-4bf0-8208-a4b31a578a3f_e62b7cec-7379-4d6f-aed7-24b48be68a74@unq.gbl.spaces"\x04from"ghttps://uk.ng.msg.teams.microsoft.com/v1/users/ME/contacts/8:orgid:e62b7cec-7379-4d6f-aed7-24b48be68a74"\x06sourceI\x02"\x07idUnion"\x134235357803446472000"\x0econversationId"[19:54dd27a7-fbb0-4bf0-8208-a4b31a578a3f_e62b7cec-7379-4d6f-aed7-24b48be68a74@unq.gbl.spaces"\rversionNumberN\x00@/\xc8\xca\x9bwB"\x07version"\r1622368092916"\x13messageStorageStateI\x02"\x15isActionExecuteUpdateF"\x1d_conversationIdMessageIdUnion"i19:54dd27a7-fbb0-4bf0-8208-a4b31a578a3f_e62b7cec-7379-4d6f-aed7-24b48be68a74@unq.gbl.spaces_1622368092916"\x0fparentMessageId"\r1622368092916"\x0bcreatedTimeN\x00@/\xc8\xca\x9bwB"\x07creator",8:orgid:e62b7cec-7379-4d6f-aed7-24b48be68a74"\x0ecreatorProfileo"\x03mri",8:orgid:e62b7cec-7379-4d6f-aed7-24b48be68a74"\x11userPrincipalName"5JaneDoe_forensics.im#EXT#@Forensicsim.onmicrosoft.com"\x0bdisplayName"\x08Jane Doe"\tgivenName"\x08Jane Doe"\x08objectId"$e62b7cec-7379-4d6f-aed7-24b48be68a74"\x04type"\x06person{\x06"\x08isFromMeT"\x0euserHasStarredF"\x1creplyChainLatestDeliveryTimeN\x00@/\xc8\xca\x9bwB"\x05stateI\x04"\x11notificationLevelI\x02"\x08mentionsA\x00$\x00\x00"\nhyperLinksA\x00$\x00\x00"\x0battachmentsA\x00$\x00\x00"\x19inputExtensionAttachmentsA\x00$\x00\x00"\x15trimmedMessageContent"+To Sherlock Holmes she is always the woman."\x1bmessageContentContainsImage0"\x1bmessageContentContainsVideoF"\x0bisSanitizedT"\x1aisPlainTextConvertedToHtmlT"\x16isRichContentProcessedT" isRichMessagePropertiesProcessedT"&isRenderContentWithGiphyDisplayEnabledT"\risForceDeleteF"\x16isSfBGroupConversationF"\x11messageLayoutTypeI\x00"\x0ccallDurationI\x00"\x14callParticipantsMrisA\x00$\x00\x00"\x16cachedDeduplicationKey"?8:orgid:e62b7cec-7379-4d6f-aed7-24b48be68a744235357803446472000"\x19cachedOriginalArrivalTime"\x1c2021-05-30T09:48:12.9160000Z"\x1ccachedOriginalArrivalTimeUtcN\x00@/\xc8\xca\x9bwB"\x0e_callRecording0"\x0f_callTranscript0"\x0f_meetingObjects0"\x15callParticipantsCountI\x01"\t_pinStateo"\x08isPinnedF{\x01{;{\x01"\x0cisInsideChat"\x04true"\x12latestDeliveryTime"\x100001622368092916{\x05'

From your blog post I took away that each of the recurring tags, such as " or { give an indication for the data that follows and their data types. I am now wondering if this object encoding has already been implemented or if I did something wrong?

Currently, I am working around this issue by splitting the record's value based on the "-character and ignoring the first byte after the split. While this works in most cases, it does not seem ideal, as it fails if a record contains a nested json array.

Please let me know if you need any additional details. I would be willing to share my test leveldb, as it contains only staged entries and nothing secretive.

cclgroupltd commented 3 years ago

Can you provide an example of the code you're using that generates this result? Is the input data indexeddb or just a "generic" leveldb?

lxndrblz commented 3 years ago

@cclgroupltd Thanks for your swift response. The (simplified) code I am using looks like this:

import click

from pathlib import Path
from ccl_chrome_indexeddb import ccl_leveldb

def read_input(filepath):
    # Do some basic error handling
    if not filepath.endswith('leveldb'):
        raise Exception('Expected a leveldb folder. Path: {}'.format(filepath))

    p = Path(filepath)
    if not p.exists():
        raise Exception('Given file path does not exists. Path: {}'.format(filepath))

    if not p.is_dir():
        raise Exception('Given file path is not a folder. Path: {}'.format(filepath))

    parse_db(filepath)

def parse_db(filepath):
    try:
        db = ccl_leveldb.RawLevelDb(filepath)
    except Exception as e:
        print(f' - Could not open {filepath} as LevelDB; {e}')

    try:
        for record in db.iterate_records_raw():
            print(record.value)
            print("*"*20)
    except ValueError:
        print(f'Exception reading LevelDB: ValueError')
    except Exception as e:
        print(f'Exception reading LevelDB: {e}')
    # Close the database
    db.close()

@click.command()
@click.option('--filepath', '-f', required=True, 
              help="Path to the IndexedDB")

def cli(filepath):
    read_input(filepath)

if __name__ == '__main__':
    cli()

Note: The f parameter will be the path to the IndexedDB folder, such as: C:\Temp\https_teams.microsoft.com_0.indexeddb.leveldb In there are all of my .ldb files and the metadata, such as the manifest and log files.

Yes, the data I am passing to the script is an IndexedDB and not one of the generic ones.

cclgroupltd commented 3 years ago

So it looks like you're using the raw access to leveldb here rather than the indexeddb functionality which is why it's not doing any decoding. I would suggest looking at the "Using the Modules" section in the readme: https://github.com/cclgroupltd/ccl_chrome_indexeddb#using-the-modules