darioragusa / JW-Library-macOS

JW Library per macOS
MIT License
9 stars 1 forks source link

Encrypted content #5

Closed temporarium closed 1 year ago

temporarium commented 1 year ago

@bedan1,

This post by you was very helpful.

Just wondering if you know how the words are indexed in the SQLite db, especially in the SearchIndexDocument and SearchTextRangeDocument tables.

This is the algorithm:

  1. Determine the publication card hash
    1. Query the SQLite Publication table
    2. Create a list with the MepsLanguageIndex, Symbol, Year fields
    3. If the IssueTagNumber field is not zero, add it to the end of the list
    4. Join the list with underscores to one string, for example for w_S_202206.jwpub, this would be 1_w22_2022_20220600
    5. Calculate the SHA 256 hash of that string
    6. Calculate the bitwise XOR with 11cbb5587e32846d4c26790c633da289f66fe5842a3a585ce1bc3a294af5ada7 CyberChef example 1
  2. Decrypt the text
    1. Query a row from the Document, BibleChapter or BibleVerse table
    2. Read the encoded Content field
    3. Run AES-128-CBC, use the first 16 bytes of the hash as AES Key, and the last 16 bytes as Initialization Vector (IV)
    4. Run Zlib Inflate

Originally posted by @bedan1 in https://github.com/darioragusa/JW-Library-macOS/issues/1#issuecomment-1079989526

darioragusa commented 1 year ago

Hi @temporarium, are you looking for something like this?

temporarium commented 1 year ago

I saw that before, but it's more cryptic than the original :-)

Have you confirmed that your deductions work?

Can you translate your graphic into some kind of algorithm?

darioragusa commented 1 year ago

I think it works(?), here you can see it in action: it extract all the words in order and put them into a file. In my comment below there is a link to the script I used

temporarium commented 1 year ago

So, going backwards, given a sentence, how would you calculate the values for those index tables?

Example: "This is a test of a sample test sentence" Easy enough to get an index of all the words used (and how often). But, how to get the values for TextUnitIndices, PositionalList and PositionalListIndex for each word?

EDIT: I think I've figured it out. Will implement a test.

temporarium commented 1 year ago

OK. All sorted out. Not encrypted. Just using signed 8-bit hex.