documentcloud / docsplit

Break Apart Documents into Images, Text, Pages and PDFs
http://documentcloud.github.io/docsplit/
Other
831 stars 215 forks source link

PDFtk dependency issues with CentOS-7/RHEL-7 | Build Fails | Dependencies libgc Unavailable #123

Open riker1 opened 9 years ago

riker1 commented 9 years ago

building PDFtk on RHEL 7 currently isn't possible due to upstrean (Fedora) dropping support for libgcj

[ericstyrer2@ceti-alpha-five ~]$#  yum localinstall pdftk-2.02-1.el6.x86_64.rpm
Error: Package: pdftk-2.02-1.el6.x86_64 (/pdftk-2.02-1.el6.x86_64)
    Requires: libgcj.so.10()(64bit)
Error: Package: pdftk-2.02-1.el6.x86_64 (/pdftk-2.02-1.el6.x86_64)
 **Requires: libgcj**
 You could try using --skip-broken to work around the problem
 You could try running: rpm -Va --nofiles --nodigest

I'm emailed the autors of PDFtk and they said they're working on it..

Date: August 15, 2014 at 11:54:50 AM EDT From: Sid Steward sid.steward@pdflabs.com To: Eric Tyrer etyrer@york.cuny.edu Subject: Re: PDF Labs

Yes, I've heard that they're dropping support for libgcj. We have been working on a new pdftk that doesn't depend on libgcj, but it is currently pre-beta.

I wrote that in August of 2014 and now its near 2015.

There hasn't been any development on libgcj since 2009, reimplementing that libary most likely would be a heavy lift. I'm guessing that Oracle wouldn't be too friendly either since they hold all the Java patents.

Also the licensing for PDFtk's other component, iText, has changed from GPL 2 to GPL 3. This might also affect redistribution?

More reading on PDFtk's death (Fedora Discussion List)

Discussion on CentOS forum on gcc-java and libgcj-devel missing (needed to compile pdftk))

There’s someone who mentions possibly an alternative PDF toolkit at the bottom of the thread..

What does the community think?

knowtheory commented 9 years ago

It's probably time to actually put pdftailor out there.

We've been using it under the hood in production for a year now, and while it doesn't replace all of pdftk, it does enough for docsplit to get its job done.

You can skip the pdftk installation process, gem install pdftailor instead and docsplit will work fine.

knowtheory commented 9 years ago

Oh another note though, it does use iText under the hood, so if you're worried about iText's AGPL license, pdftailor's not going to help you much there.

riker1 commented 9 years ago

thanks for the info. i'm not worried about the AGPL license per se.. it seemed that in the threads that may be a redistribution issue? i'm not sure how that works -- I'm a sysadmin not a lawyer (lol).

jscrobinson commented 9 years ago

After some searching we've found http://qpdf.sourceforge.net/ which seems to be a good replacement for pdftk (at least for encryption).

jpujari commented 9 years ago

We switched to paid php packages, since it doesn't have OS dependency http://www.setasign.com/products/

vielhuber commented 9 years ago

I am a pdftk power user and the dependency problems on GCJ seems to be a big problem for us. We are soon updating our servers to Fedora 22 and Cent OS 7.

Can somebody please give us some information about the current plans for the future of pdftk? Is it in active development or not? What’s the progress on moving away from GCJ?

If not, are there any alternatives for filling out pdf forms from the command line? Thanks in advance.

jpujari commented 9 years ago

We had to switch to a paid product, but the product is good and reliable

https://www.setasign.com/products/setapdf-formfiller/details/ On Oct 19, 2015 5:39 PM, "David Vielhuber" notifications@github.com wrote:

I am a pdftk power user and the dependency problems on GCJ seems to be a big problem for us. We are soon updating our servers to Fedora 22 and Cent OS 7.

Can somebody please give us some information about the current plans for the future of pdftk? Is it in active development or not? What’s the progress on moving away from GCJ?

If not, are there any alternatives for filling out pdf forms from the command line? Thanks in advance.

— Reply to this email directly or view it on GitHub https://github.com/documentcloud/docsplit/issues/123#issuecomment-149355293 .

vielhuber commented 9 years ago

Thanks for your answer.

jpujari commented 9 years ago

I did some research on other open source alternatives could not find any at the time which is 8 months back. Not sure if you will have better luck. As for the dependencies, SetaPDF have listed the system requirements on the following page

https://www.setasign.com/support/faq/setapdf/system-requirements/#p-88 https://www.setasign.com/support/faq/setapdf/system-requirements/#p-88

Thanks and Regards, Jeetendra Pujari

On Mon, Nov 2, 2015 at 3:29 AM, David Vielhuber notifications@github.com wrote:

Thanks for your answer.

-

Does SetaPDF have any critical dependencies? Which libraries do they use?

I cannot imagine that there is no other open source tool like pdftk to fill out pdf forms from the command line. But it seems that this is the case. Have I overlooked something?

— Reply to this email directly or view it on GitHub https://github.com/documentcloud/docsplit/issues/123#issuecomment-152950345 .

knowtheory commented 9 years ago

just a heads up, we're slowly replacing pdftk's feature set w/ PDFium which we've wrapped up into PDFShaver.

At the moment tho we're just using PDFium + FreeImage to generate snapshots of pages.

jpmckinney commented 8 years ago

@knowtheory Would you recommend using PDFShaver over GraphicsMagick for generating the images that Tesseract performs OCR on?

robert-scheck commented 8 years ago

Given this issue is still open, I would like to point out that there is meanwhile a Yum repository at https://copr.fedoraproject.org/coprs/robert/pdftk/ serving a pdftk RPM package for RHEL/CentOS 7 – because I just needed PDFtk myself. However, for the long term a switch (as already mentioned before) might be clever through (rather depending on retired software projects).

fulldecent commented 8 years ago

Hello all. pdftk / CentOS 7 compatibility is a big problem for me. Also the copr solution is not supported by Rackspace, my sysadmin. pdftk is clearly the best solution but it is not actively maintained and it has legacy which has gone stale. Of course the solution is simple -- fork it!

My company will contribute a bounty of $1,000 to "fix" this issue, which will of course require a LOT of effort and rewriting. We may increase that further, and I invite others to add to that bounty if you can. I will use Bountysource. I will solicit to others that use pdftk (see https://github.com/search?utf8=%E2%9C%93&q=pdftk). I might even get a GitHub ban / warning for this. Oh well, I break rules sometimes.

Before we can offer a bounty, I need to be sure somebody won't collect the bounty and mess everything up. Would somebody here be willing to help with adding a couple VERY simple test cases to the fork and Travis CI integration?

The fork is at https://github.com/fulldecent/pdftk and I have added this information to the README. I would appreciate your thoughts to help make this a success!

coltcox commented 8 years ago

I was able to get pdftk working on CentOS 7 by using these two repos.

These commands will get you fully up and running.

wget https://copr.fedorainfracloud.org/coprs/robert/gcj/repo/epel-7/robert-gcj-epel-7.repo -P /etc/yum.repos.d

https://copr.fedorainfracloud.org/coprs/robert/pdftk/repo/epel-7/robert-pdftk-epel-7.repo -P /etc/yum.repos.d

yum install pdftk

treeandbrick commented 8 years ago

For extracting/splitting pages, ghostscript works great.

Also good: poppler. It provides pdfseparate and pdfunite.

RoyHelgeRasmussen commented 8 years ago

Coltox script works like a charm. All hail to Robert for providing this solution

I was able to get pdftk working on CentOS 7 by using these two repos.

https://copr.fedorainfracloud.org/coprs/robert/gcj/ https://copr.fedorainfracloud.org/coprs/robert/pdftk/ These commands will get you fully up and running.

wget https://copr.fedorainfracloud.org/coprs/robert/gcj/repo/epel-7/robert-gcj-epel-7.repo -P /etc/yum.repos.d

https://copr.fedorainfracloud.org/coprs/robert/pdftk/repo/epel-7/robert-pdftk-epel-7.repo -P /etc/yum.repos.d

yum install pdftk

bhushangahire commented 8 years ago

I have installed PDFTK using Robert's repo. Its installed correctly but I am using it for Foll Fill which doesnt work.

Grigsby2 commented 8 years ago

cpdf (Coherent PDF Command Line Tools) does everything that pdftk can do- and a lot more- except for filling PDF form fields. It's freely available (not-for-commercial-use license) from Github, and its homepage is at http://community.coherentpdf.com. Due to the issues discussed in this thread I switched over to it around six months ago, in place of pdftk, and have been a very happy user. Check out its user manual at that link for the full list of features.

vielhuber commented 8 years ago

I think filling out forms is the killer feature why we all use pdftk.

Grigsby2 commented 8 years ago

Some comments above mention things like splitting, merging, and encryption, so if those are what someone is looking for, and comes across this thread, I thought a mention of cpdf could help them. True enough, it doesn't fill forms, which others need.

chebee7i commented 8 years ago

cpdf looks nice, but it's closed :(

scarlet0 commented 8 years ago

Unfortunately these links give Error 500:Internal Server Error Anybody has these 2 repos? It's really ass pain to get worked pdftk on Centos7.

Coltox script works like a charm. All hail to Robert for providing this solution

I was able to get pdftk working on CentOS 7 by using these two repos.

https://copr.fedorainfracloud.org/coprs/robert/gcj/ https://copr.fedorainfracloud.org/coprs/robert/pdftk/ These commands will get you fully up and running.

wget https://copr.fedorainfracloud.org/coprs/robert/gcj/repo/epel-7/robert-gcj-epel-7.repo -P > /etc/yum.repos.d

https://copr.fedorainfracloud.org/coprs/robert/pdftk/repo/epel-7/robert-pdftk-epel-7.repo -P > > /etc/yum.repos.d

yum install pdftk

robert-scheck commented 8 years ago

Unfortunately these links give Error 500:Internal Server Error Anybody has these 2 repos? It's really ass pain to get worked pdftk on Centos7.

Was only a temporary issue as it seems: https://fedorahosted.org/fedora-infrastructure/ticket/5376

bridgeport commented 8 years ago

Just throwing this out there (for the sake of future-proofing your setups). If you don't absolutely have to stick with CentOS, you can switch to another Linux server operating system, such as Ubuntu, which still supports PDFTK and its dependencies.

For instance, Ubuntu 16.04 was released Apr 21, 2016 and the current PDFTK works fine on it. Here's how to install it: http://installion.co.uk/ubuntu/xenial/universe/p/pdftk/install/index.html

If you're on a cPanel server and must stick with CentOS v6, just to have cPanel, this may be out of the question. But if you're able and willing to migrate, you can setup a VPS with a provider such as DigitalOcean, Vultr, or Linode, and use a control panel such as ServerPilot or Laravel Forge to help you manage your server.

robert-scheck commented 8 years ago

From my point of view, recommending (or trying to push) a random Linux distribution that still ships pdftk, is a very bad idea. Pdftk relies on GCJ which is since 2013 in deep maintenance mode only, see also: https://gcc.gnu.org/ml/gcc/2013-11/msg00153.html

riker1 commented 8 years ago

Community,

The real issue is that gcc-java, libgcj, and libgcj-devel are essentially EOL’d, dead, buried, over, done, baked, put out to pasture.

Not to mention with the mess Oracle has made of Java (EE mainly) I doubt that the libgcj will ever come out of hibernation mode.

Unless the folks who have developed PDFtk rewrite/rethink their tool to work on modern EL/OS without resorting to using outdated libraries, unsupported distros, and MIA repos of said libraries… I’ll be using a different tool. Mostly pdfhaver and pdftailor do what I need for document cloud.

I’m not a Ubuntu user per-se.. IIRC one of the main reasons for removing GCJ support was that vulnerabilities weren’t being patched. Perusing Launchpad all versions of the libraries aren’t tracking anything upstream. Less and less software uses these libraries (Tomcat for example doesn’t use it since version 7).

I think its just time to move along.

Cheers!

Eric

On Jul 7, 2016, at 2:47 PM, bridgeport notifications@github.com wrote:

Just throwing this out there (for the sake of future-proofing your setups). If you don't absolutely have to stick with CentOS, you can switch to another Linux server operating system, such as Ubuntu, which still supports PDFTK and its dependencies

Eric S. Tyrer II

Associate Director – Web and Digital Communications

York College - The City University of New York

94-20 Guy R. Brewer Blvd.

Academic Core Building - STE 1H14

Jamaica, NY 11451

http://www.york.cuny.edu/etyrer

etyrer@york.cuny.edu

(P) 718-262-2466

(C) 347-393-6507

"I have no special talent. I am only passionately curious.” — Albert Einstein

jamieburchell commented 8 years ago

I was able to get pdftk working on CentOS 7 by using these two repos.

Does anybody know what the implications of installing this repo are in terms of the dead dependencies that it presumably brings with it? Is it easy enough to uninstall it and its deps?

The latest version of pdftk has an issue where it won't accept data from stdin when merging forms, so I'm happy to see that "Robert" has included a 1.45 build!

If Robert's repo should disappear, is there a way I can store it locally?

robert-scheck commented 8 years ago

I do not have any plans to let my PDFtk-related repositories die. In case Fedora infrastructure ends the COPR service, this repository will definately come up somewhere else (except there are legal reasons indeed).

All packages in the repository are made to hopefully create no overlap or conflict with any other package and to hopefully not break any other dependency etc. In theory, no other package should depend on the packages provided in my repositories, thus these few packages can be easily uninstalled. No guarantee for anything through ;-)

In case you see any need to mirror my repositories, you could mirror the relevant subdirectories of

locally. Finally, you need to create your own *.repo files for yum or dnf.

I am not sure whether it is clever to hijack this docsplit issue, so if somebody would like to follow up PDFtk on RHEL or CentOS, please send me a message or e-mail directly.

marcofalzone commented 8 years ago

@robert-scheck Robert, I've your same issue, a Centos 6 server running fine with some self made scripts calling PDFtk. Now I'm building a new Centos 7 server for a quite similar purpose and I'm stucked with it. Could you please help me? I have no idea about how to mirror your directories and create a repo file (I always installed via yum). I'm pretty new to Linux logics. Thank you.

robert-scheck commented 8 years ago

Folks, if you are looking for help related to my repository, please send me an e-mail rather adding yet another comment to this issue – please! While I still do not see any need to mirror my repository (and if you don't know how to mirror a repository yourself, you likely shouldn't mirror it, but simply use it), read e.g. http://yum.baseurl.org/wiki/RepoCreate for the basics.

ewheelerinc commented 8 years ago

@riker1, We just ran into the same issue. You can certainly use the libgcj repository along with the package @robert-scheck provides. It turns out that libgcj.so.10 from el6 is compatible with el7's shared library bindings for PDFtk. We built some RPMs that include the library from CentOS6, so if you would like a 1-line install then see here depending on your architecture: https://www.globallinuxsecurity.pro/pdftk-works-on-centos-7/

@jamieburchell, Since we packaged the official el6 library, it should be compatible for some time to come.

@bhushangahire, Would you please test and see if this one works with Foll Fill?

@vielhuber, You might try to copy libgcj.so.10* into Fedora 22 and see if it works. I'm not sure if our package is Fedora 22 compatible or not, but it will certainly work with Fedora 19.

Eric Wheeler

jsosic commented 7 years ago

@ewheelerinc can you provide spec file?

bravadomizzou commented 7 years ago

@ewheelerinc Both RPM files on that page are 404, also I don't feel comfortable installing server software from an alternate source than its creator/distributor.

ewheelerinc commented 7 years ago

@jsosic, @bravadomizzou, here is the spec. Also, we fixed the 404.

Really all we are doing is repacking libgcj.so.10* which we pulled out of CentOS 6 libgcj-4.4.7-17.el6. PDFtk was downloaded as an RPM from their site unmodified except that we converted it to a tar and added libgcj. You may need to edit the spec to make it build on your system, but it works in our build environment: https://www.linuxglobal.com/static/blog/pdftk.spec

jsosic commented 7 years ago

@ewheelerinc thank you very much! :1st_place_medal:

TOPSTech commented 7 years ago

@coltcox Thank you so much your solution worked for me and saved my day.

gauravscorpsgit commented 5 years ago

I was able to get pdftk working on CentOS 7 by using these two repos.

These commands will get you fully up and running.

wget https://copr.fedorainfracloud.org/coprs/robert/gcj/repo/epel-7/robert-gcj-epel-7.repo -P /etc/yum.repos.d

https://copr.fedorainfracloud.org/coprs/robert/pdftk/repo/epel-7/robert-pdftk-epel-7.repo -P /etc/yum.repos.d

yum install pdftk

Unfortunately I am getting below issue:

Failed to set locale, defaulting to C Loaded plugins: priorities, update-motd, upgrade-helper Resolving Dependencies --> Running transaction check ---> Package pdftk.x86_64 0:2.02-1.el7 will be installed --> Processing Dependency: libgcj.so.14()(64bit) for package: pdftk-2.02-1.el7.x86_64 Package libgcj-4.8.5-4.el7.x86_64 is obsoleted by libgcc72-7.2.1-2.59.amzn1.x86_64 which is already installed --> Finished Dependency Resolution Error: Package: pdftk-2.02-1.el7.x86_64 (copr:copr.fedorainfracloud.org:robert:pdftk) Requires: libgcj.so.14()(64bit) Available: libgcj-4.8.5-4.el7.x86_64 (copr:copr.fedorainfracloud.org:robert:gcj) libgcj.so.14()(64bit) You could try using --skip-broken to work around the problem You could try running: rpm -Va --nofiles --nodigest

gauravscorpsgit commented 5 years ago

@ewheelerinc thank you so much sir! :D

robert-scheck commented 3 years ago

As this issue is still open 5 years after my initial post, I would like to add that my COPR pdftk repository for RHEL/CentOS 7 has been deprecated in October 2021 in favor of the new pdftk-java port (being GCJ-free) that is available for RHEL/CentOS 7/8 via EPEL using yum install epel-release (if not already done before) and finally yum install pdftk-java. Existing users of my COPR pdftk repository are getting auto-migrated to pdftk-java during the next run of yum update.

jamieburchell commented 3 years ago

As this issue is still open 5 years after my initial post, I would like to add that my COPR pdftk repository for RHEL/CentOS 7 has been deprecated in October 2021 in favor of the new pdftk-java port (being GCJ-free) that is available for RHEL/CentOS 7/8 via EPEL using yum install epel-release (if not already done before) and finally yum install pdftk-java. Existing users of my COPR pdftk repository are getting auto-migrated to pdftk-java during the next run of yum update.

As I have your COPR repo and pdftk running in production (CentOS 7) are there any incompatibilities or missing features I should be aware of, or is this a like-for-like replacement?

jamieburchell commented 3 years ago

I was recently looking for pdftk support on CentOS 8, noted that the COPR repo didn't have a CentOS 8 version (I did reach out to ask if that would be a thing) and in the absence of a solution stumbled upon this post which might help someone if the pdftk-java version is no use

robert-scheck commented 3 years ago

As I have your COPR repo and pdftk running in production (CentOS 7) are there any incompatibilities or missing features I should be aware of, or is this a like-for-like replacement?

As pdftk-java is a port of old GCJ-based pdftk to Java, it's intended to be a drop-in ("like-for-like") replacement. However some old pdftk bugs (from the GCJ variant) have been fixed already, and pdftk-java upstream is trying to get rid of further old issues, which should make it superior.

Given that the ancient original GCJ-based pdftk gets more painful with each new distribution release (because it requires unmaintained software and additionally, the GCJ-based pdftk development can be considered dead as well), the pdftk-java package in Fedora 33+ and EPEL 7+ will silently replace existing installed pdftk RPM packages.

jamieburchell commented 3 years ago

As I have your COPR repo and pdftk running in production (CentOS 7) are there any incompatibilities or missing features I should be aware of, or is this a like-for-like replacement?

As pdftk-java is a port of old GCJ-based pdftk to Java, it's intended to be a drop-in ("like-for-like") replacement. However some old pdftk bugs (from the GCJ variant) have been fixed already, and pdftk-java upstream is trying to get rid of further old issues, which should make it superior.

Given that the ancient original GCJ-based pdftk gets more painful with each new distribution release (because it requires unmaintained software and additionally, the GCJ-based pdftk development can be considered dead as well), the pdftk-java package in Fedora 33+ and EPEL 7+ will silently replace existing installed pdftk RPM packages.

Just checked in on the aforementioned production server and see it has has been replaced and thank goodness everything is still working as expected. Thank you for providing your repo to facilitate the requirement for all these years.

KJ7LNW commented 1 year ago

FYI, the globallinuxsecurity.pro link above is old, here is the authoritative link: https://www.linuxglobal.com/pdftk-works-on-centos-7/

robert-scheck commented 1 year ago

Currently, the best way for a well maintained pdftk on CentOS/RHEL/Rocky Linux 7, 8 and 9 still is:

  1. yum install -y epel-release
  2. yum install -y pdftk

There is absolutely no need for strange third-party howtos suggesting unverifiable/untrusted packages.

KJ7LNW commented 1 year ago

@robert-scheck, is pdftk-java a 100% compatible version with the old pdftk/GCJ version?

(I understand where you are coming from about repo trust, I'm the same way. But FYI, linuxglobal.com is where I work, we released the package, and they use it in production. Its just a re-build of the one from el6 with appropriate el6 deps. At the time I wrote the article, pdftk-java wasn't available for el7.)

robert-scheck commented 1 year ago

@robert-scheck, is pdftk-java a 100% compatible version with the old pdftk/GCJ version?

In all setups in which I was involved, pdftk-java was usable as a drop-in replacement so far. Aside of this, many CentOS setups out there have EPEL enabled, thus they got already auto-migrated from pdftk to pdftk-java over the last nearly two years – and I actually received zero bug reports. Citing from pdftk-java upstream:

The current goals are to keep functionality as compatible with the original as it is reasonable, to fix any issues present in the original (correctness takes precedence over compatibility, see the differences), and to clean up the code. New functionality may be added, but it is not a priority.

While this still leaves some risks for bugs, pdftk-java is well maintained by an active upstream – and also covers modern non-x86 cloud systems (such as ARM64), so I personally always would give pdftk-java a try.