MrCyjaneK / jwapi

FOSS replacement for JW Library app, that works on Ubuntu Touch, Debian (mobian & droidian), android, and any other os! Uses jw.org api directly.
https://mrcyjanek.net/projects/jwapi/
GNU General Public License v2.0
33 stars 0 forks source link

Need help with understanding JWPUB format #1

Open MrCyjaneK opened 3 years ago

MrCyjaneK commented 3 years ago

I have no idea how to get words out of Content in .db file located in jwpub archive. what I know. So any help is needed.

livrasand commented 10 months ago

@MrCyjaneK, personally I would love it if you could develop an application on your own, I and more than a thousand people would use it, a light, fast and open source app, it could be developed in Go, and maybe use a Fyne interface or something similar. Many devices no longer work with the most recent version of JW Library, imagine how many users will benefit from your app, an app just to read JWPUB's and maybe play videos and images (JWPUB Reader or JWPUB Library, just a crazy name), I think it would not be of your valuable time wasted. You would make it work much better and many low-income brothers continue to use their same old devices.

You know how the JW Library works, use a web browser to display the content, and you have the keys to decrypt the JWPUB's. Believe me it wouldn't be a waste of time. You can get the official JW Library catalog with the API, and thus have its publications in your app. The languages you can get from JW.ORG too.

I would help make that project a reality. Many, believe me, many would use it as an alternative, even some, came to prefer it.

I encourage you to make it happen friend. Do you remember the letter I sent to Bethel with my circuit overseer's account? No response either, LOL.

arthurwweber commented 10 months ago

Friendly reminder that security by obscurity is not security.

True, however this practice was not invented by the MEPS Programming department at WHQ. Obfuscation of code and/or content has always been a go-to step when one wanted to give "muggles" a hard time deciphering stuff.

register many handlers for different kind of files.

Using customized URI handlers is also an expected approach. Let's not forget that all these different formats for the publications (high resolution PostScript/PDF for print, low resolution PDF for web publishing, HTML on JW.org, HTML on WOL, ePub, JWPub, WTLIB Pub) are being produced from the same source files by the different modules of the MEPS suite (which is why P no longer stands for Phototypesetting, but Publishing). In order for the hyperlinks to work properly inside the app, those link will have to use an internal URI schema.

Being hostile towards other developers is a terrible thing to do, and it honestly feels wrong to encrypt JWPUB, why would you force people to use your own badly.

This is also sort of expected. Remember that JWPub is a proprietary format designed by the MEPS Programming department (under the direction of the Writing Committee of the Governing Body). Historically brothers at Bethel are not allowed to ask for outside input, and the Governing Body has repeatedly been very reluctant toward random unsolicited feedback from the field.

why would you make it so hard for a group of a few people who just want to read the publications in their own way?

Because they want to keep things official. Because they want to enforce their right of ownership. And they want to prevent unauthorized individuals to abuse of the content or otherwise tamper with the official way of doing things.

What is in there that you want to hide? Why don't you simply provide an up-to-date .epub on the site?

They have nothing to hide per se. The Study Bible is not a complete publication, so generating an .epub is probably not an option. It's been a pattern with publications that are work in progress to be released incrementally as JWPub's (e.g. the New World Translation in languages where books are partially translated, the Insight volumes in Romanian)

So basically they store information about if the file is correct inside of that file? Lol.

That isn't new either. You are surely aware that ZIP files for example also have a CRC field that unpacking checks to make sure that the integrity of the file has been preserved over time and successive transfer operations.

MrCyjaneK commented 10 months ago

@arthurwweber

Using customized URI handlers is also an expected approach. Let's not forget that all these different formats for the publications (high resolution PostScript/PDF for print, low resolution PDF for web publishing, HTML on JW.org, HTML on WOL, ePub, JWPub, WTLIB Pub) are being produced from the same source files by the different modules of the MEPS suite (which is why P no longer stands for Phototypesetting, but Publishing). In order for the hyperlinks to work properly inside the app, those link will have to use an internal URI schema.

so enlighten me why text/vcard is being used by JW Library to hook (actually it is */* if I'm still correct). Also - I didn't use the app in 3 years or so, but afaik opening links in jw.org just opens web browser not web browser in app + you can't simply open pdf in the library. Have you used JW Library app? Honesty - I couldn't findy any extension that would actually do something else than start the application except for the backup file. So you clearly implemented that incorrectly - because even if PDF opening was intended feature of JW Library app then it simply doesn't work (and I'm getting tired of helping people to unset JW Library as their default app.

That isn't new either. You are surely aware that ZIP files for example also have a CRC field that unpacking checks to make sure that the integrity of the file has been preserved over time and successive transfer operations.

Oh of course I'm aware of this - just not seeing the point of that - I'd rather recommend some kind of signing - as used in .apk files.

They have nothing to hide per se. The Study Bible is not a complete publication, so generating an .epub is probably not an option. It's been a pattern with publications that are work in progress to be released incrementally as JWPub's (e.g. the New World Translation in languages where books are partially translated, the Insight volumes in Romanian)

So in fact people without official glowing in the dark JW Library app can't read them.

This is also sort of expected. Remember that JWPub is a proprietary format designed by the MEPS Programming department (under the direction of the Writing Committee of the Governing Body).

I understand that jwpub offers some unique features but as far as I know there is no publication that exists in JW Library that wouldn't be available under wol.jw.org - which means that all jwpub content can be displayed as html (and it is either webview or richtextview of some kind in the app I guess) WHICH MEANS that you can deliver .epub with the exact same feature set, and ship a manifest.json in the epub file itself - without losing any metadata.

I get that they have decided to reinvent a wheel (for sure without looking at any existing wheels that already work and are widely used (some even support DRM!)), but the wheen they have designed is just a terrible piece of code - have you tried opening a publication on a device older than 5 years? (In comparision my library that is in this repository worked flawlessly on a number of phones without any lag - despite being my learning project with many bad decisions made on the way.)

Historically brothers at Bethel are not allowed to ask for outside input, and the Governing Body has repeatedly been very reluctant toward random unsolicited feedback from the field.

Oh I know that. My security bug report didn't receive any attention since well.. 2020? This is violating google play policy afaik. If you provide a contact email it must be a contact email, not a echo service (which is good, I use it to test my email servers).

Also, I'd prefer to not discuss religious topics in here - let's just focus on the abomination that was created in the MEPS and is available to download freely on the internet.

MrCyjaneK commented 10 months ago

personally I would love it if you could develop an application on your own, I and more than a thousand people would use it, a light, fast and open source app, it could be developed in Go, and maybe use a Fyne interface or something similar. Many devices no longer work with the most recent version of JW Library, imagine how many users will benefit from your app, an app just to read JWPUB's and maybe play videos and images (JWPUB Reader or JWPUB Library, just a crazy name), I think it would not be of your valuable time wasted. You would make it work much better and many low-income brothers continue to use their same old devices.

Well I can't do that fully but if I'd get some help (eg the frontend job done) - I most likely can provide you with with help in doing so - I can even create a simple project in flutter if I get some free time.

You know how the JW Library works, use a web browser to display the content, and you have the keys to decrypt the JWPUB's. Believe me it wouldn't be a waste of time. You can get the official JW Library catalog with the API, and thus have its publications in your app. The languages you can get from JW.ORG too.

Yeah - hence in this repository we have a somewhat working app

I encourage you to make it happen friend. Do you remember the letter I sent to Bethel with my circuit overseer's account? No response either, LOL.

insert joke about ransomware groups having better support than them here

livrasand commented 10 months ago

I tried to try JWapi, but I ran into several problems, your website gives me problems with SSL, I access it still and the page stays in an infinite load, until it gives as error 504. I tried to download the binary version Lorca for Windows or for Kali and neither. Do you need help with your SSL? Maybe you could try a free one, for example, ZeroSSL.com.

I'll try to work on an app in Go that works as a JWPUB's reader, maybe I'll be successful. For iOS and Android (old devices will surely appreciate it).

I'm no Go expert, but I've replicated some things and tried others, and so far they've worked for me. If you don't mind, maybe he'll ask you for help in your spare time. I'll also try using JWapi again, and maybe replicate what you already do.

MrCyjaneK commented 10 months ago

oh the builds expired long time ago because I've switched CI. I have somewhere source for a flutter version of jw library.. but no jwpub magic there yet... sadly.

orangethewell commented 10 months ago

@livrasand As I stated before on this discussion, open-witness-library is progressing well. Not quite well as JWApi has already got but its working with some tricky config. Since I do it on my free time, I could not get some of the most important features working, but I'm still active with it and soon will pull a commit with a nice way to read the pubs. (Since you have to hard add it on local data directory for now)

livrasand commented 10 months ago

Oh excellent, do you need help with something? I would love to support.

MrCyjaneK commented 10 months ago

@orangethewell is jwpub decryption working for new publications for you?

orangethewell commented 10 months ago

@MrCyjaneK Uhh I think it's still working. For the most pubs that I tested, none of them had any problems (lff too). But maybe I should test it again soon

EDIT: Yeah, its working, tested CA-brpgm24 (Circuit Assembly program for 2023-2024)

MrCyjaneK commented 10 months ago

I (personally) think that the best thing to do right now is to just decrypt the jwpub, parse it into something more usable (I'd personally go for epub) and then fork some already existing epub reader to give it look and feel of JW Library app

MrCyjaneK commented 10 months ago

Also did any of you try to work with the watchtower library? It contains all the data but.. Well it's a mix of xml and binary blobs that binwalk doesn't recognize at all

orangethewell commented 10 months ago

I (personally) think that the best thing to do right now is to just decrypt the jwpub, parse it into something more usable (I'd personally go for epub) and then fork some already existing epub reader to give it look and feel of JW Library app

It isn't my objective, but for sure I want to make a way of people make it possible. Like creating a plugin system for Open Witness Library that people can add their own plugins for do stuff they need (Assignments, turn the program into a home server, exporting stuff, etc.)

Real yeah, it would be kinda cool be able to export publication content in any type!

livrasand commented 10 months ago

Maybe something like this:

https://www.npmjs.com/package/jw-epub-parser https://github.com/sws2apps/jw-epub-parser

You can take the idea of working to do it with the JWPUB. A few days ago I found an open door at JW.ORG. Which gave me access to a training website for Betelitas or for members who will work with the branch (I have also reported it and still no response).

There I found how MEPS works, I share what I found, maybe it will help you.

The first course, is a Introduction This course provides you with a brief introduction of Digital Publishing. The goal of this course is to ensure you have a complete picture of the importance of your role and how it affects the publications.

Unit 1: Introduction to MEPS The work as a Digital Publisher requires some knowledge of MEPS. MEPS contains a lot of features, but in this lesson only the things that are necessary for a Digital Publisher to know will be discussed.

Unit 2: Format a Publication Before a publication is added to a research library it must be formatted. Formatting involves adding necessary information to a MEPS document to control its appearance and how the information is presented to readers.

Unit 3: Create a Research Library In this unit, learn to build your own Bethel Family (BF) library, and index a publication in it. Improve the appearance of the publication, and ensure that the library is free from errors. Once a BF library for a new language has been set up, WTS will help you continue to add new publications to it and create digital publications from it. The exercises in each lesson must be finished in order to continue the exercises in the next lesson.

Unit 4: Using WTS to Produce Digital Publications Now that you have some basic knowledge about MEPS and a working Research Library, you are ready to produce digital publications. Digital publications are produced after the publication is sent for printing. The digital publications you can read on jw.org are called WPUBs. The digital publications that can be downloaded from jw.org are EPUB files. EPUBs can be read on tablets and other electronic readers. Finally, there are JWPUB files that can be read on the JW Library app.

The production of digital publications could be done manually using the tools that are available in the MEPS Research Suite. However, this involves many manual steps, and errors could be introduced easily if a step is omitted. To make it easier and to improve the quality, the production is now integrated in the WTS processes.

In this lesson we will follow the WTS "Trans-Comp-Audio Prep-Research Format" (TCAR) and "Audio Record-Index-Digital Publishing (Print)" (AIDP) processes. Since documentation already exists for these processes, we will refer to them. Every activity in this process will not be discussed since many of them are very simple. Open the process page and under the heading "General" chose one of these processes. There you will be able to find information about each activity in the process.

Unit 5: Maintaining Research Libraries Research libraries need to be keept up to date. Every month new publications are released and they need to be added to the research libraries. In this unit we will discuss what libraries we are working with and how to make the updates available to the Bethel family.

imagen

I haven't been able to get into any so far, maybe you @MrCyjaneK could make it.

arthurwweber commented 10 months ago

@orangethewell Can you please update me on the decryption algorithm since it's working for you?

orangethewell commented 10 months ago

Yeah, here: https://github.com/darioragusa/JW-Library-macOS/issues/1#issuecomment-1079989526

MrCyjaneK commented 10 months ago

@livrasand no. You made it paid. + I don't understand a single thing from readme (except for currency indicator).

orangethewell commented 10 months ago

@livrasand I can't accept it too, even if it's open source. We learn so much of not to make commerce inside congregation and I can't see it in a nice way.

Even though I have a lot of work with the Open Witness Library, I don't plan making any profit with that, it will be free and open source, since part of the code isn't mine.

MrCyjaneK commented 10 months ago

as I've already said - the way to go (imo) is to prepare a script to convert JWPUB into epub and use some already existing epub reader to give it nice looking skin. Or hence even roll our own reader since epub is html in a zip.

livrasand commented 10 months ago

I thank you very much friends for your comments, I will take them into account, I will see how I can make it possible for it to be totally free, and at the same time pay for the server and the domain. In fact, that is the only premise, paying those expenses, out there would be totally free. I need to update the website, but KHA itself is free to some extent, and I plan to make it completely free.

Thanks again, and I will put your comments into practice.

MrCyjaneK commented 10 months ago

I will see how I can make it possible for it to be totally free, and at the same time pay for the server and the domain.

In fact, that is the only premise, paying those expenses

  1. you don't need domain
  2. you don't need the server.

Either you are being dishonest with us or have developed something that creates more problems than it fixes. We don't need yet another JW Library app that have vendor lock in - just moved to some other community based server that is paid. So I'd rather use official app than your service that depends on jw services under the hood (probably).

That being said, @livrasand, - what exactly are you doing on the backend that can't be done on the frontend of your app?

and if you trully need hosting I can provide you with one as long as you open source the entire app (including server) (and translate readme to english) (and make it free to use for all).

livrasand commented 10 months ago

Clear @MrCyjaneK! That's how it will be, I'll make the changes in the next few days. Thank you very much 👍🏻 the app does not use a server or domain, my website does, goattendant.com.

MrCyjaneK commented 10 months ago

https://pages.github.com/ @livrasand this is where you can host the site. Also does your app use JWPUB?

livrasand commented 10 months ago

Thank you very much friend, no, it does not use JWPUB, for the meeting assignment programs I will use EPUB, and the GETPUBMEDIALINKS API from JW.ORG.

MrCyjaneK commented 10 months ago

epub is not available in some languages for some of the publications

MrCyjaneK commented 10 months ago

I think Content is encrypted with AES, maybe AES-256

This is the algorithm:

1. Determine the publication card hash

   1. Query the SQLite `Publication` table
   2. Create a list with the `MepsLanguageIndex`, `Symbol`, `Year` fields
   3. If the `IssueTagNumber` field is not zero, add it to the end of the list
   4. Join the list with underscores to one string, for example for w_S_202206.jwpub, this would be `1_w22_2022_20220600`
   5. Calculate the SHA 256 hash of that string
   6. Calculate the bitwise XOR with `11cbb5587e32846d4c26790c633da289f66fe5842a3a585ce1bc3a294af5ada7`
      [CyberChef example 1](https://gchq.github.io/CyberChef/#recipe=SHA2('256',64,160)From_Hex('Auto')XOR(%7B'option':'Hex','string':'11cbb5587e32846d4c26790c633da289f66fe5842a3a585ce1bc3a294af5ada7'%7D,'Standard',false)To_Hex('None',0)&input=MV93MjJfMjAyMl8yMDIyMDYwMA)

2. Decrypt the text

   1. Query a row from the `Document`, `BibleChapter` or `BibleVerse` table
   2. Read the encoded `Content` field
   3. Run AES-128-CBC, use the first 16 bytes of the hash as AES Key, and the last 16 bytes as Initialization Vector (IV)
   4. Run Zlib Inflate
      [CyberChef example 2](https://gchq.github.io/CyberChef/#recipe=AES_Decrypt(%7B'option':'Hex','string':'909fd5b41ddd8a75ac39c69604828a7d'%7D,%7B'option':'Hex','string':'3bc2c616d0ca2cff6dc4c0d7263a2327'%7D,'CBC','Hex','Raw',%7B'option':'Hex','string':''%7D,%7B'option':'Hex','string':''%7D)Zlib_Inflate(0,0,'Adaptive',false,false)&input=ZjZhZmMwMTEzZmRiMzY4MDE4ZmEzYmEwZDUwNjJlZWFmNGJlNzVhY2Q2NDJkNzM0YTQ2N2M2OTNjODIyMWM2NzM4ODMwYzE0NDQwMjVkZWQ2ZTZmNGZiNjBjZjgyYzcwYWUyYTY5M2EyYTg3NjQ5NWVlMGMxZTU5MDE1NDcyODM0OTMyMGI1OWY2NDAwNzRjMDgzMzYxN2RhYzI5N2ZjZmI1NTY4ODhhMDgzOTAyZGZhYTgyZDNmNmY1MjZhMjI2NTJkZTlhYmEzZjU4NjRkM2RlOTQzMGU2N2Q4NmViNzQwYmUzM2MyNmFjMWVjMmZmNDg0MGE1ODRkYjdhMmMyM2I3Nzc5Y2FlZDNhZGZhZjQ5MzJhZWJhMjM4MDUxNTgzNjRkMjBhNGIxMGU3Njg3YjU4NzBjNDMzZWZmOTk2ZmJjMGI5NGRkYzRkMTI0YjA0NjU2NTUxMjVkZGNmNzI4ZjNlYTgxMDk5YjZkNGM1YjM5NTczM2U5OTZjNjE0YzgyMTA4MTZkNDc3MzViMjI4YzU0MWJiNGJhMTBkMDBlODVkZmVlODlmN2YzOTQ0Y2NmOTk2YWJmMDJkY2QyZTdkMzJlOTU0MGY3ZTVkMDUzMjE2OWFjZTM5ODk5MWI2YzMzYTlmZDdhYmUzMjMyYTU4ODIwYWZhNDFmOTdmMTU2YWI4YTExMzEzMjI5MjhkMDNhOWM4OTE4NGEzOTE5YTQ3NGM2YjM1MWM4MDg2NDkwYWM0ZDYyYTI3MDU2YWMzN2M2NTFmZjYxYmExMmJmZDRhNzk0YmQxNjNlNmM3MjRhYmY2ZjMxMTc1OTQyMDE4OWM3NjNkZGJjMGQxMzQ2Y2Q1ZDUyYjJiNzExYTg1MGY0NWI3MTk5NDIzNDIwNDRhZDVhYjMwMmY1ZGQzMWJjZjlhMjcwZjM4YTlmNmJlMWFjZDlkYzMxYTczYjE1YzBlNDcwNmYyYThjNDE3NmFiNzg1ODI4NDMyMjUwNzM0MzIxMzY1MWRhMTk5ODg5NTMyYzJiOTQyYTkwOTRjMDVlOWY2OGRjM2ExNmU0NTUzNTE4MTA4MzQzODk2YmRhZTA2ZTVmYjIxZDYzMzRjYjk4NmVmMzlkN2Q1MzQ3YTQwNWU3YzkzMGYzYTVlN2U1ZWM1MWY5MDk4OTNkYjg1ZGZlZjUzYTM3ODBhZWQ3OGNkNjRjM2EzMTdjNWM5MDkwZjYyYjNhMGVkMjZiODBlYWMxNzMxMmEzMDkxMzA0MTIxYmM1Mzc4ZTU3MGQ5MjVjMjcwMmYwNmY3ZWJlNTYwNGRmNDRmMjQ0YjA5MjYxODkyYTE0OTIyYzkxMTBhZTIyY2FkMzg1NmM3MDQ1NDlmMmYwNTVmNGI1Mjg0MDQwYzk5Y2I4ZGY2Y2Q4N2Q5NjA1MmIzYTJjY2IxYmE1ZWMxNTMyOWY2ZThlNmM2NjhjMjkyMjViMWMxN2YxYTVhN2MzZmQ0YmVmODRiMzYyYjJkOTg3NGUyMTIyNDUzYzUyMDA3OTFkOTFhYmZjMzU0YzkxMWJlMTY4NmFmNmIzYTJmMjBjZTc2MzBlZjRiMzJjYWExYzcyZTA2NzhlNDcwMTk2NTYzZTZlNTgxZTNjYjAwOTRlOGYyMWIyZTUxZWZjYTNlNDdkZDExN2YzNGFmZWJhNmI2ODJjY2FkZjhkM2Q2ZjE5MDVlNzIxN2JkZTVjMTU3ZThiMmExOWYyYWFiZGEwZTM3OGZkODA3MjgwNGVmNWJhN2ZjMTQzOTg1NmFjNDVkYjY1MDY1OTBkMDI0Zjc5YjY0Njk1ZWJiYTYyN2U2YTdmOTkzYzZlMmY3NDdhZGQ0MmYyOTQyMGVjMzc5NmJjNWU5M2QzNzAwNTg0NzM4ZDI2YmE3ODUzZTVlNjgzMmU0YzQ5NDM1MGI5MThjMTlmZTI1MmY1)

Copying the notes just in here - I may revisit libjw soon

orangethewell commented 9 months ago

Sorry for the long wait, I have been working a lot these days and almost didn't have time to work with open witness library, but today I made a new commit where you can add publications within the program and see them separated on categories. I noted that some pubs like Index, Watchtower study magazines and the Bible still don't work well with my implementation, but when I have a free time, I will work better on that.

orangethewell commented 9 months ago

Anyone could discover what type of css library does the publications use? I thought it was Tailwind, but it seems have a slightly different class signature

livrasand commented 9 months ago

@orangethewell Yes I have the CSS. You can see it here:

https://github.com/livrasand/livrasand.github.io/tree/main/JW-Library-Visualizer-API

I extracted the CSS, but JW Library uses a JSX file written in React, inside is the CSS. Contains all JW Library styles. I am currently working on ReviwDocs, an application like Word that will use that CSS to give styles and make it easier and faster to create JWPUB. Maybe you can use the function or the algorithm for your project (which is great).

orangethewell commented 9 months ago

@yuniermv you don't save data on .jwpub files, in fact, you save it within your app database, or a outside file to recover the notes when you reload the publication

arthurwweber commented 9 months ago

Yeah. I know. I was referring to how I can encrypt the resulting html from the BLOB once I have edited the html document.

You first recompress the HTML using ZLib Deflate and then encrypt it back using AES-128-CBC with the same parameters, you update the blob using the UPDATE SQL command in the SQLite database. Then all you have to do is rewrite the JWPUB file: you update the contents file with the modified .db file and you recompute its hash that you have to store in the manifest.json file.

MrCyjaneK commented 7 months ago

Hey! Does any of you guys maybe have some old catalog.db lying around? I kind of want to check for something. Some publications seems missing.

MrCyjaneK commented 7 months ago

So my current plan is to:

Make JWPUB format more usable, I'm using FOSSJWPUB as the draft name (.fossjwpub extension), which is adding as few changes as possible but makes it much easier to use.

As can be seen the new format is a bit heavier without compression (while jwpub is literally a .zip file, with .zip file inside with zlib content in the database, so I wouldn't consider that uncompressed...), but compresses to a much smaller (~19.8% size reduction, without losing any content) document.

➜  libjw ls nwtsty_E.* -lah
-rw-r--r-- 1 user user 160M Nov 16 18:27 nwtsty_E.fossjwpub
-rw-r--r-- 1 user user  93M Nov 16 18:27 nwtsty_E.fossjwpub.xz
-rw-r--r-- 1 user user 116M Nov 15 15:45 nwtsty_E.jwpub
-rw-r--r-- 1 user user 106M Nov 15 15:45 nwtsty_E.jwpub.xz

I still do not intend to support jwpub/fossjwpub as it is in any reading app, but making something similar to jwpub that can be easily worked on (It's just SQL) seems like a pretty obvious task.

My plan is to create a script to convert all the publications into a fully-featured markdown documents, which later I'll use together with a thing like hugo (especially hugo-book or https://github.com/weitblick/epub) to put the code into something actually readable by 3rd party software (as I'm really pissed off by not providing any useful way of using the content outside of app/website).

And 2nd task, that I may or may not finish (depending entirely on you guys, if somebody want to use the code chances are that I'll be much more motivated to actually do it) is to create a cross-platform library app (most likely using fyne.io toolkit), that will use generated earlier markdown documents to display the content.

Also few changes from my side are comming to the main website.. but that's not part of the issue.

MrCyjaneK commented 7 months ago

image

Felt cute, might delete later

MrCyjaneK commented 7 months ago

I think that I've hit a roadblock.

What does one do when Extract references a publication that isn't available in the Extract table itself?

For example

SELECT `Content` FROM `Extract` WHERE `Link`='p/E:502014236/'

Would result in an extract that is fine but itself contains reference to jwpub://p/E:1102012654/ which cannot be found in the Extract table.

What do you think should be the behavior of the publication? Should it "grab" the publication from external source by the MipsDocumentId no matter what, to ensure that once somebody downloads the book it will work, or should it just contain a link to download the other publication?

Also, this shows entirely different issue that occurs: We cannot for sure know where to look for the publication. Afaik there is no endpoint to get publication by MepsDocumentId.. Which indicates that we need to be able to search for it in all publications...

Quick google search for "1102012654" resulted in some wol.jw.org results - but the result I got was in Polish (and the MepsDocumentId is in English (not to mention the fact that the article is not the one mentioned in the Extract))

For future reference, link: https://wol.jw.org/pl/wol/d/r12/lp-p/1102012654

Even though I wanted to avoid doing what I'm about to do I think that the only way to go is to combine all JWPUBs into one, and then build an api/tools on top of that.

If what I know is at least somewhat correct, all JWPUBs available to download weight a little under 0.5TB, ~434.719 GiB, to be exact - according to my early calculations..

arthurwweber commented 7 months ago

MepsDocumentId is the unique identifier for an article. If the article is downloadable on its own, you can use GetPubMediaLinks to identify it (https://b.jw-cdn.org/apis/pub-media/GETPUBMEDIALINKS?output=json&docid=1102012654&langwritten=E returns hl, section 12). Otherwise you might have to index the TOC of each publication into a table where you can check on the fly whether you have that certain publication.

MrCyjaneK commented 7 months ago

@arthurwweber Thanks for sharing the link, it is sowing correct publication, so I will use it for sure.

MrCyjaneK commented 7 months ago

There is also one more issue, that I'd like to fix.. Is there any way to convert MepsLanguageId to the langcode?

I have this file: https://github.com/MrCyjaneK/jwapi/blob/master/libjw/mepsmap.go that I generate using a script, but something tells me that there must be at least slightly better solution to this.

arthurwweber commented 7 months ago

There is a Languages table in mepsunit.db, that's where you find the respective language mnemonic for each MepsLanguageId.

MrCyjaneK commented 7 months ago

I found it in /data/data/org.jw.jwlibrary.mobile/databases/mepsunit.db. Can't open the app to watch for traffic atm, thanks

MrCyjaneK commented 7 months ago

Everything is working as intended on my end, but I honestly didn't expect this much changes to happen to the catalog.db, I was hoping that a little php script will be able to keep everything up-to-date but it looks like a no-go because it is simply too slow to process all the data (before it finishes updating the MySQL table we already have a new revision available).

I'm rewriting a bit of the code in Go, so it will be a bit more multi-threaded and (I hope) will be able to handle everything faster.

image

edit: This is a cool learning experience for me, I've never worked with databases that big in terms of schema, and I learn a lot about performance.

MrCyjaneK commented 7 months ago

New discovery:

Another discoveries, that broken the program :| :

MrCyjaneK commented 7 months ago

After resolving schema issues, and fixing my database I'm doing a last non-final import of the data (314m rows in and no critical issues spotted).

contents.db is just a pain to work with, tables/colums come and go as they like, in one publication you can have few versions of database for different languages. Most notably CaptionContent table seems to be added randomly to publications:

Example log from export ``` 12:54:49 main.go:155: Worker 5 processing: jy_RMV.jwpub 12:54:49 main.go:155: Worker 9 processing: jy_RR.jwpub 12:54:50 main.go:155: Worker 9 processing: jy_RU.jwpub 12:54:50 main.go:155: Worker 5 processing: jy_SA.jwpub 12:54:50 decrypt.go:301: no such column: CaptionContent 12:54:52 main.go:155: Worker 10 processing: jy_SB.jwpub 12:54:52 main.go:155: Worker 4 processing: jy_SBO.jwpub 12:54:54 main.go:155: Worker 2 processing: jy_SC.jwpub 12:54:54 main.go:155: Worker 12 processing: jy_SE.jwpub 12:54:54 decrypt.go:301: no such column: CaptionContent 12:54:54 decrypt.go:301: no such column: CaptionContent 12:54:55 main.go:155: Worker 11 processing: jy_SEN.jwpub 12:54:55 main.go:155: Worker 8 processing: jy_SG.jwpub 12:54:56 main.go:155: Worker 7 processing: jy_SH.jwpub ```
MrCyjaneK commented 7 months ago

5 days of interrupted work and I've imported most of JWPUBs that were easy to obtain (curl + jq in a for loop). I have managed to downlaod 414.89 GiB of them, and discovered few interesting things on my way

What's next on the roadmap for me?

aaand obviously, I'll make some kind of blogpost too with some fun technical details that I've skipped.

orangethewell commented 7 months ago

I've been using a lot ChatGPT to translate publication styles into CSS, there are two ways common to handle these styles:

After all, handling these publication styles is a big mess, I tried importing JW.ORG styles and switching prefixes, but it seems the result didn't get very well as should be.

I've been kinda busy these days, I haven't touched my implementation for weeks or months. But I had a idea like you said, implementing a local server for jwpub archives and sharing it over LAN, but not planned as a main feature, more like a plugin.

GuyMicciche commented 3 months ago

I have a database extracted from the jwpub file. Like for example, nwtsty.db. Some fields have BLOB and I can't see the data using sql. How can I see the data?

MrCyjaneK commented 3 months ago

@GuyMicciche https://github.com/MrCyjaneK/jwapi/issues/1#issuecomment-1714309559 ChatGPT is actually pretty good in converting that message into code (worked for golang)

MrCyjaneK commented 3 months ago

what are you working on btw?

livrasand commented 3 months ago

If you need help knowing how a JWPUB works or the styles you can use, I recommend you take a look at the Reviw wiki

GuyMicciche commented 3 months ago

what are you working on btw?

I have a python script that I already got headway with. Give a link to the db file, and it generates the correct hashes. But right now I'm having trouble getting the data from "Content" blob. should I share the script here?