Closed jlcd closed 5 years ago
Thanks for the report.
To be clear, this is just for the @B option which includes the name of a bookmark in a file. The bookmarks included in each split file are not affected.
It is a crude way to avoid writing filenames which contain special characters, which can be illegal or hard to wrangle on some systems.
So yes, we can fix this. It would work by adding the -utf8 flag, so that the change remains backward-compatible. We would still remove some special characters, such as newlines, but be UTF8-aware.
Man you've a quick response.
Now I realize why you did this. Some characters may not be filename-friendly, so I agree 100% with you.
I just now found out about the -utf8
flag (when I read about listing bookmarks). From my [limited] context on cpdf, I guess this would be indeed the flag to use when keeping bookmark-filenames [mostly] unchanged. It would be something like "force utf-8 filenames as I'm aware of the consequences".
Of course I don't expect this to be done lightning fast, but do you have an ETA for when this will reach a stable version? Asking just to know if I should work on some workarounds or wait.
And of course, many thanks for this excelent tool.
Not sure if this is the proper way to generate UTF8 filenames (when the -utf8
flag is set), but I gave it a try: https://github.com/jlcd/cpdf-source/commit/a5e9f4dcc5e56cc1afd7fa493ed7680dad755f22#diff-a1ea83527a319a64f1e227a3add40e68
Never developed anything with OCaml, so I'm pretty sure there are some bits off place there.
Seems to be working for my scenario.
Edit:
Why my binary version takes roughly 3.5 times more to run than the binary from this repository?
And I mean, even if I download the source, run make
on the raw files, my cpdf ... -split-bookmarks
takes ~20s, while this repository's binary takes ~6s.
This repo's binary:
root@21b14f4a4c40:/tmp# time ./cpdf2 -split-bookmarks 0 ./x.pdf -utf8 -o "./my/%%%%% @B.pdf"
real 0m5.904s
user 0m4.060s
sys 0m0.390s
My version from source (untouched):
root@21b14f4a4c40:/tmp# time ./cpdf -split-bookmarks 0 ./x.pdf -utf8 -o "./my/%%%%% @B.pdf"
real 0m20.454s
user 0m17.910s
sys 0m0.680s
Thanks! I'll take a detailed look soon.
(Speed: you somehow built the bytecode version not the native code version?)
Not sure, I just ran make
to compile it.
Should I have done it in any other way?
Edit3:
Ok, finally got it.
The issue was that I was checking out v2.2.1
and not v2.2-patchlevel1
.
When I got camlpdf
and cpdf
both from v2.2-patchlevel1
I got the same quick result I was getting from the binaries within this repository.
Steps to success:
root@21b14f4a4c40:/tmp/cpdf-source# opam remove cpdf
[...]
root@21b14f4a4c40:/tmp/cpdf-source# opam remove camlpdf
[...]
root@21b14f4a4c40:/tmp# git clone https://github.com/johnwhitington/camlpdf.git
[...]
root@21b14f4a4c40:/tmp# cd camlpdf/
root@21b14f4a4c40:/tmp/camlpdf# git checkout v2.2-patchlevel1
[...]
root@21b14f4a4c40:/tmp/camlpdf# make
[...]
root@21b14f4a4c40:/tmp/camlpdf# make install
[...]
root@21b14f4a4c40:/tmp# git clone https://github.com/johnwhitington/cpdf-source.git
[...]
root@21b14f4a4c40:/tmp# cd cpdf-source/
root@21b14f4a4c40:/tmp/cpdf-source# git checkout v2.2-patchlevel1
[...]
root@21b14f4a4c40:/tmp/cpdf-source# make
[...]
root@21b14f4a4c40:/tmp/cpdf-source# time ./cpdf -split-bookmarks 0 ../x.pdf -utf8 -o ../my/$RANDOM%%%%%@B.pdf
real 0m6.299s
user 0m4.160s
sys 0m0.610s
Below are some prior steps of what I tried to do. Leaving them here just in case it helps someone that came from Google.
Edit1: Pretty sure it's native:
make[1]: Entering directory '/tmp/cpdf-source'
ocamlfind ocamldep -native cpdfcommand.mli > ._ncdi/cpdfcommand.di
ocamlfind ocamldep -native cpdf.mli > ._ncdi/cpdf.di
ocamlfind ocamldep -native cpdfstrftime.mli > ._ncdi/cpdfstrftime.di
ocamlfind ocamldep -native xmlm.mli > ._ncdi/xmlm.di
ocamlfind ocamldep cpdfcommandrun.ml > ._d/cpdfcommandrun.d
ocamlfind ocamldep cpdfcommand.ml > ._d/cpdfcommand.d
ocamlfind ocamldep cpdf.ml > ._d/cpdf.d
ocamlfind ocamldep cpdfstrftime.ml > ._d/cpdfstrftime.d
ocamlfind ocamldep xmlm.ml > ._d/xmlm.d
ocamlfind ocamlc -package camlpdf -c -annot xmlm.mli
ocamlfind ocamlopt -package camlpdf -c -annot -g -w -3 -annot xmlm.ml
ocamlfind ocamlc -package camlpdf -c -annot cpdfstrftime.mli
ocamlfind ocamlopt -package camlpdf -c -annot -g -w -3 -annot cpdfstrftime.ml
ocamlfind ocamlc -package camlpdf -c -annot cpdf.mli
ocamlfind ocamlopt -package camlpdf -c -annot -g -w -3 -annot cpdf.ml
ocamlfind ocamlc -package camlpdf -c -annot cpdfcommand.mli
ocamlfind ocamlopt -package camlpdf -c -annot -g -w -3 -annot cpdfcommand.ml
ocamlfind ocamlopt -package camlpdf -c -annot -g -w -3 -annot cpdfcommandrun.ml
ocamlfind ocamlopt \
-package camlpdf -linkpkg \
-g -o cpdf \
xmlm.cmx cpdfstrftime.cmx cpdf.cmx cpdfcommand.cmx cpdfcommandrun.cmx
make[1]: Leaving directory '/tmp/cpdf-source'
make[1]: Entering directory '/tmp/cpdf-source'
ocamlfind ocamlopt -a -g -o cpdf.cmxa xmlm.cmx cpdfstrftime.cmx cpdf.cmx cpdfcommand.cmx cpdfcommandrun.cmx
make[1]: Leaving directory '/tmp/cpdf-source'
make[1]: Entering directory '/tmp/cpdf-source'
ocamlfind ocamldep cpdfcommand.mli > ._bcdi/cpdfcommand.di
ocamlfind ocamldep cpdf.mli > ._bcdi/cpdf.di
ocamlfind ocamldep cpdfstrftime.mli > ._bcdi/cpdfstrftime.di
ocamlfind ocamldep xmlm.mli > ._bcdi/xmlm.di
ocamlfind ocamlc -package camlpdf -c -annot -g -w -3 -annot xmlm.ml
ocamlfind ocamlc -package camlpdf -c -annot -g -w -3 -annot cpdfstrftime.ml
ocamlfind ocamlc -package camlpdf -c -annot -g -w -3 -annot cpdf.ml
ocamlfind ocamlc -package camlpdf -c -annot -g -w -3 -annot cpdfcommand.ml
ocamlfind ocamlc -package camlpdf -c -annot -g -w -3 -annot cpdfcommandrun.ml
ocamlfind ocamlmktop \
-package camlpdf -linkpkg \
-g -o cpdf.top \
xmlm.cmo cpdfstrftime.cmo cpdf.cmo cpdfcommand.cmo cpdfcommandrun.cmo
make[1]: Leaving directory '/tmp/cpdf-source'
mkdir -p doc/cpdf/html
rm -rf doc/cpdf/html/*
ocamlfind ocamldoc -package camlpdf -html -d doc/cpdf/html xmlm.mli cpdfstrftime.mli cpdf.mli cpdfcommand.mli
Edit2:
Just tested and had same slow result when I got cpdf
from opam install cpdf
:
root@21b14f4a4c40:/tmp/cpdf-source# opam info cpdf
package: cpdf
version: 2.2.1
repository: default
upstream-url: https://github.com/johnwhitington/cpdf-source/archive/v2.2.1.zip
upstream-kind: http
upstream-checksum: 5c0caa7bed9452cf7d1ed0492929824d
homepage: http://github.com/johnwhitington/cpdf-source
bug-reports: http://github.com/johnwhitington/cpdf-source/issues
dev-repo: git://github.com/johnwhitington/cpdf-source
author: John Whitington
depends: ocamlfind & camlpdf >= 2.2.1
installed-version: 2.2.1 [system]
available-versions: 1.7, 2.1.1, 2.2.1
description: High-level pdf tools based on CamlPDF
root@21b14f4a4c40:/tmp/cpdf-source# opam remove cpdf
The following actions will be performed:
- remove cpdf.2.2.1
=== 1 to remove ===
=-=- Removing Packages =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Removing cpdf.2.2.1.
ocamlfind remove cpdf
root@21b14f4a4c40:/tmp/cpdf-source# opam info cpdf
package: cpdf
version: 2.2.1
repository: default
upstream-url: https://github.com/johnwhitington/cpdf-source/archive/v2.2.1.zip
upstream-kind: http
upstream-checksum: 5c0caa7bed9452cf7d1ed0492929824d
homepage: http://github.com/johnwhitington/cpdf-source
bug-reports: http://github.com/johnwhitington/cpdf-source/issues
dev-repo: git://github.com/johnwhitington/cpdf-source
author: John Whitington
depends: ocamlfind & camlpdf >= 2.2.1
installed-version:
available-versions: 1.7, 2.1.1, 2.2.1
description: High-level pdf tools based on CamlPDF
root@21b14f4a4c40:/tmp/cpdf-source# opam install cpdf
The following actions will be performed:
- install cpdf.2.2.1
=== 1 to install ===
=-=- Synchronizing package archives -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=-=- Installing packages =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Building cpdf.2.2.1:
make
make install
Installing cpdf.2.2.1.
root@21b14f4a4c40:/tmp/cpdf-source# time cpdf -split-bookmarks 0 ../x.pdf -utf8 -o ../my/$RANDOM%%%%%@B.pdf
real 0m17.665s
user 0m15.810s
sys 0m0.350s
root@21b14f4a4c40:/tmp/cpdf-source#
Can you give me the output of file cpdf
in the slow OPAM case?
Ok, just confirming, it really is way slower:
root@21b14f4a4c40:/tmp/cpdf-source# time ./cpdf -split-bookmarks 0 ../x.pdf -utf8 -o ../my/$RANDOM%%%%%@B.pdf
real 0m21.702s
user 0m17.220s
sys 0m1.890s
root@21b14f4a4c40:/tmp/cpdf-source# time ../cpdf -split-bookmarks 0 ../x.pdf -utf8 -o ../my/$RANDOM%%%%%@B.pdf
real 0m6.590s
user 0m4.600s
sys 0m0.310s
And the file <slower_cpdf>
command output you asked:
root@21b14f4a4c40:/tmp/cpdf-source# file ./cpdf
./cpdf: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=490fc74a859c8c6931b0ab1aa0d207abaa092e2e, not stripped
And, just as it may help somehow, the output of file <faster_cpdf>
:
root@21b14f4a4c40:/tmp/cpdf-source# file ../cpdf
../cpdf: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=cec462743cfda5b86c8bc7b2e2f9f9ffacc88b89, not stripped
Fixed in forthcoming v2.3. Bookmark names for @B are stripped as before, unless -utf8 is supplied, in which case problematic characters and characters < 32 only are stripped. If -raw is supplied, the text is not processed at all (not recommended).
I've read that the
-split-bookmarks
operation removes some characters, as per:Not sure why it was made this way, but are UTF-8 bookmark titles expected to be implemented in the near future? If not, may I leave this open as a feature request?