Install LibreOffice and sample tbmp001.zip attached

artydont commented 1 year ago

All should install LibreOffice. It will be used in making things happen automagically. Sample initial trial run of producing output for project tbpm from input in #11 is in the attached file tbmp001.zip.

Meanwhile that trial run can be used to to read epub and annotate pdf of the book linked from #11 and #12 with wrong page numbers, while waiting for me to work out how to fix pagination (via pandoc markdown).

To convert a Microsoft .docx file just open it with LibreOffice Writer and use the File menu to “Save as” ODF (the standard Office Document Format).

Also “Export” to pdf (for annotating in Zotero 6) and to epub (soon for annotating in Zotero 7)

Windows:

still = stable

https://community.chocolatey.org/packages/libreoffice-still 7.5.6

fresh = recent “bleeding edge”

https://community.chocolatey.org/packages/libreoffice-fresh 7.6.2

MacOS

fresh

https://formulae.brew.sh/cask/libreoffice 7.6.2

I don’t know whether this claim that it does not automatically update properly is true.

If so, uninstall and use the normal MacOS download:

https://www.libreoffice.org/get-help/install-howto/macos/

and from the download page options choose either fresh 7.6.2 or stable 7.5.8 for either intel or Apple Silicon depending on the Apple computer.

@Ted1307 and any others interested. Attached are the output from my very quick choice of export options WITHOUT attempting to fix wrong page numbers. Only for reading and initial annotating (and playing with Sigil). Don't waste time on attempt at editing. tbmp001.zip

artydont commented 1 year ago

Actually I did not include the input .docx file as claimed in the readme.txt.

For the record, I made it read only and renamed it and it's sha256sum hash code is:

f5f447c8be481df4ec0b7002bd0dc59f116ff910eab41d9d75da8dea167754f5 tbmp.ro.docx

artydont commented 1 year ago

PS the point of the exercise is that eventually it will become an option available on a Zotero plugin.

Select a wordprocessor file (eg a link to one) and choose an option to produce a similar zip file as output for inspection and further processing. Instead of @Ted1307 waiting to be provided with an epub after we get a pdf and @DavidMc1948 produces a docx file anybody can just use that option to get an initial epub plus a pdf that includes odf attachment and has same page breaks as the original pdf.

Likewise selecting a scan file can send it to an OCR pipeline to produce docx.

The provenance record then records actual diffs produced from manual editing and the corresponding before and after hashcodes and timestamps and signatures.

PS since I might have screwed up the file so I have checked they are identical:

$ sha256sum *x f5f447c8be481df4ec0b7002bd0dc59f116ff910eab41d9d75da8dea167754f5 tbmp.ro.docx f5f447c8be481df4ec0b7002bd0dc59f116ff910eab41d9d75da8dea167754f5 txtbkMarxPhilwIndx.docx

Typical further processing would use a pandoc command configured to fix pagination, extraction of citations etc etc.

T

ScientificPublishing / SciPub

Install LibreOffice and sample tbmp001.zip attached #13