UCL-RITS / rcps-buildscripts

Scripts to automate package builds on RC Platforms
MIT License
39 stars 27 forks source link

Install Request: poppler as a module #488

Open themkots opened 2 years ago

themkots commented 2 years ago

Application: poppler compiled with gcc 9.2.0 Link: https://poppler.freedesktop.org/

Cluster: Myriad (initially for econ-myriad users)

Description:

License:

Special versions or variants:

Ticket number: IN:05333938

themkots commented 2 years ago

As things look now, from econ-myriad rstudio server, a command install.packages("tesseract") needs poppler installed and despite having the devel rpms for poppler installed / available, there are linker errors when R tries to build poppler. Making a module for poppler as Brian suggested and using gcc 9.2.0 to do so, seems the right way to allow econ-myriad users to build tesseract locally by using the above-mentioned command.

larsnesheim commented 2 years ago

I am still unable to use this from the RStudio Server. I am unable to use the package pdftools because installation fails. I believe that there is still a problem with poppler. Is there an update on this?

larsnesheim commented 1 year ago

Is there any update on this? It seems nothing has happened since June 10??

balston commented 1 year ago

I've taken over looking into this from my colleague who has left the team.

balston commented 1 year ago

Latest version of Poppler is 22.10.0 so will start with this version. We will need both:

balston commented 1 year ago

I have a build script ready for testing.

balston commented 1 year ago

It didn't work first time because it was picking up an older CMAKE version. Fixed and is building now using:

module -f unload compilers mpi gcc-libs
module load beta-modules
./Poppler-22.10.0_gnu-9.2.0_install 2>&1 | tee ~/Software/Poppler/Poppler-22.10.0_gnu-9.2.0_install.log-31102022-1
balston commented 1 year ago

The build has failed with:

[ 25%] Building CXX object CMakeFiles/poppler.dir/poppler/PSTokenizer.cc.o
[ 25%] Building CXX object CMakeFiles/poppler.dir/poppler/SignatureInfo.cc.o
In file included from /dev/shm/tmp.DUVrfdxtul/poppler-22.10.0/poppler/SignatureInfo.cc:28:
/usr/include/nss3/hasht.h:48:29: error: ‘PRBool’ has not been declared
   48 |     void (*destroy)(void *, PRBool);
      |                             ^~~~~~
make[2]: *** [CMakeFiles/poppler.dir/poppler/SignatureInfo.cc.o] Error 1
make[1]: *** [CMakeFiles/poppler.dir/all] Error 2
make: *** [all] Error 2

Investigating ...

balston commented 1 year ago

Copies of the CMAKE build logs are in:

~ccspapp//Software/Poppler/CMakeError.log
~ccspapp//Software/Poppler/CMakeOutput.log
balston commented 1 year ago

This may be relevant:

https://bugs.freedesktop.org/show_bug.cgi?id=106388

balston commented 1 year ago

The patch described in the above link is needed on RedHat 7.x systems like ours. Build has now got further but now fails with the following error:

[ 39%] Building CXX object CMakeFiles/poppler.dir/poppler/CurlCachedFile.cc.o
In file included from /dev/shm/tmp.sajL4mvfDW/poppler-22.10.0/poppler/CurlCachedFile.h:18,
                 from /dev/shm/tmp.sajL4mvfDW/poppler-22.10.0/poppler/CurlCachedFile.cc:15:
/dev/shm/tmp.sajL4mvfDW/poppler-22.10.0/poppler/CurlCachedFile.cc: In member function ‘virtual size_t CurlCachedFileLoader::init(CachedFile*)’:
/dev/shm/tmp.sajL4mvfDW/poppler-22.10.0/poppler/CurlCachedFile.cc:53:33: error: ‘CURLINFO_CONTENT_LENGTH_DOWNLOAD_T’ was not declared in this scope; did you mean ‘CURLINFO_CONTENT_LENGTH_DOWNLOAD’?
   53 |         curl_easy_getinfo(curl, CURLINFO_CONTENT_LENGTH_DOWNLOAD_T, &contentLength);
      |                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
make[2]: *** [CMakeFiles/poppler.dir/poppler/CurlCachedFile.cc.o] Error 1
make[1]: *** [CMakeFiles/poppler.dir/all] Error 2
make: *** [all] Error 2

possibly need to load the Curl module during the build for a newer version?

balston commented 1 year ago

We could switch libcurl support off using:

-DENABLE_LIBCURL=OFF

in the build script but should be able to get CMAKE to pick up the correct includes and library for curl/7.47.1/gnu-4.9.2.

balston commented 1 year ago

I switched on verbose output in the CMAKE config and the failing compile is using the correct include paths for the CURL module. The problem is that we have Curl 7.47.1 and Poppler requires at least 7.55.0.

Need to build a new Curl before continuing with the Poppler build. Lates Curl is 7.86.0 so will start with that version.

balston commented 1 year ago

Curl 7.86.0 now installed. Note while Myriad is down development is being done on Kathleen so will need to be re-done on Myriad.

balston commented 1 year ago

Poppler build script updated to use new Curl. Building again...

balston commented 1 year ago

The build has got a lot further this time before failing - up to 63%. It now fails with:

[ 63%] Building CXX object utils/CMakeFiles/pdfsig.dir/pdfsig.cc.o
cd /dev/shm/tmp.mayDDV3Gr4/poppler-22.10.0/build/utils && /shared/ucl/apps/gcc/9.2.0/bin/g++  -I/dev/shm/tmp.mayDDV3Gr4/poppler-22.10.0 -I/dev/shm/tmp.mayDDV3Gr4/poppler-22.10.0/fofi -I/dev/shm/tmp.mayDDV3Gr4/poppler-22.10.0/goo -I/dev/shm/tmp.mayDDV3Gr4/poppler-22.10.0/poppler -I/dev/shm/tmp.mayDDV3Gr4/poppler-22.10.0/build -I/dev/shm/tmp.mayDDV3Gr4/poppler-22.10.0/build/poppler -I/dev/shm/tmp.mayDDV3Gr4/poppler-22.10.0/utils -I/dev/shm/tmp.mayDDV3Gr4/poppler-22.10.0/build/utils -isystem /usr/include/cairo -isystem /usr/include/nss3 -isystem /usr/include/nspr4 -Wall -Wextra -Wpedantic -Wno-unused-parameter -Wcast-align -Wformat-security -Wframe-larger-than=65536 -Wlogical-op -Wmissing-format-attribute -Wnon-virtual-dtor -Woverloaded-virtual -Wmissing-declarations -Wundef -Wzero-as-null-pointer-constant -Wshadow -Wsuggest-override -fno-exceptions -fno-check-new -fno-common -fno-operator-names -D_DEFAULT_SOURCE -O2 -g  -fvisibility=hidden -fvisibility-inlines-hidden -std=c++17 -MD -MT utils/CMakeFiles/pdfsig.dir/pdfsig.cc.o -MF CMakeFiles/pdfsig.dir/pdfsig.cc.o.d -o CMakeFiles/pdfsig.dir/pdfsig.cc.o -c /dev/shm/tmp.mayDDV3Gr4/poppler-22.10.0/utils/pdfsig.cc
In file included from /dev/shm/tmp.mayDDV3Gr4/poppler-22.10.0/utils/pdfsig.cc:31:
/usr/include/nss3/hasht.h:48:29: error: ‘PRBool’ has not been declared
   48 |     void (*destroy)(void *, PRBool);
      |                             ^~~~~~
make[2]: *** [utils/CMakeFiles/pdfsig.dir/pdfsig.cc.o] Error 1
make[2]: Leaving directory `/dev/shm/tmp.mayDDV3Gr4/poppler-22.10.0/build'
make[1]: *** [utils/CMakeFiles/pdfsig.dir/all] Error 2
make[1]: Leaving directory `/dev/shm/tmp.mayDDV3Gr4/poppler-22.10.0/build'
make: *** [all] Error 2

This is the same as the first error so I must have missed one of the source files that needs to be patched for RedHat.

balston commented 1 year ago

I had tried to patch utils/pdfsig.cc but had made an error. Now fixed. Building again ...

larsnesheim commented 1 year ago

Thank you for all this work. I am sorry that it is so complicated.

On Wed, Nov 2, 2022 at 5:37 PM balston @.***> wrote:

I had tried to patch utils/pdfsig.cc but had made an error. Now fixed. Building again ...

— Reply to this email directly, view it on GitHub https://github.com/UCL-RITS/rcps-buildscripts/issues/488#issuecomment-1300997825, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABL2Y64H2DX2SCPDFJ4TD2LWGKRGVANCNFSM5YN4BRPA . You are receiving this because you commented.Message ID: @.***>

-- Professor Lars Nesheim Co-Director Centre for Microdata Methods and Practice (CEMMAP) UCL and IFS

email: @.*** phone: +44.(0)20.7679.5826 web: http://www.cemmap.ac.uk

balston commented 1 year ago

Further progress though the build up to 82%n and then:

[ 82%] Building C object glib/tests/CMakeFiles/pdfdrawbb.dir/pdfdrawbb.c.o
cd /dev/shm/tmp.0bBm22GGlO/poppler-22.10.0/build/glib/tests && /shared/ucl/apps/gcc/9.2.0/bin/gcc -DG_LOG_DOMAIN=\"Poppler\" -DTESTDATADIR=\"/dev/shm/tmp.0bBm22GGlO/poppler-22.10.0/../test\" -I/dev/shm/tmp.0bBm22GGlO/poppler-22.10.0 -I/dev/shm/tmp.0bBm22GGlO/poppler-22.10.0/fofi -I/dev/shm/tmp.0bBm22GGlO/poppler-22.10.0/goo -I/dev/shm/tmp.0bBm22GGlO/poppler-22.10.0/poppler -I/dev/shm/tmp.0bBm22GGlO/poppler-22.10.0/build -I/dev/shm/tmp.0bBm22GGlO/poppler-22.10.0/build/poppler -I/dev/shm/tmp.0bBm22GGlO/poppler-22.10.0/glib -I/dev/shm/tmp.0bBm22GGlO/poppler-22.10.0/build/glib -isystem /usr/include/glib-2.0 -isystem /usr/lib64/glib-2.0/include -isystem /usr/include/cairo -isystem /usr/include/freetype2 -Wall -std=c99 -D_DEFAULT_SOURCE -O2 -g  -fvisibility=hidden   -pthread  -DG_DISABLE_DEPRECATED  -DG_DISABLE_SINGLE_INCLUDES -pthread -std=c11 -MD -MT glib/tests/CMakeFiles/pdfdrawbb.dir/pdfdrawbb.c.o -MF CMakeFiles/pdfdrawbb.dir/pdfdrawbb.c.o.d -o CMakeFiles/pdfdrawbb.dir/pdfdrawbb.c.o -c /dev/shm/tmp.0bBm22GGlO/poppler-22.10.0/glib/tests/pdfdrawbb.c
/dev/shm/tmp.0bBm22GGlO/poppler-22.10.0/glib/tests/pdfdrawbb.c: In function ‘main’:
/dev/shm/tmp.0bBm22GGlO/poppler-22.10.0/glib/tests/pdfdrawbb.c:65:19: warning: implicit declaration of function ‘getopt’ [-Wimplicit-function-declaration]
   65 |     while ((opt = getopt(argc, argv, "h")) != -1) {
      |                   ^~~~~~
/dev/shm/tmp.0bBm22GGlO/poppler-22.10.0/glib/tests/pdfdrawbb.c:73:30: error: ‘optind’ undeclared (first use in this function)
   73 |     if (!usage && argc - 1 < optind) {
      |                              ^~~~~~
/dev/shm/tmp.0bBm22GGlO/poppler-22.10.0/glib/tests/pdfdrawbb.c:73:30: note: each undeclared identifier is reported only once for each function it appears in
make[2]: *** [glib/tests/CMakeFiles/pdfdrawbb.dir/pdfdrawbb.c.o] Error 1
make[2]: Leaving directory `/dev/shm/tmp.0bBm22GGlO/poppler-22.10.0/build'
make[1]: *** [glib/tests/CMakeFiles/pdfdrawbb.dir/all] Error 2
make[1]: Leaving directory `/dev/shm/tmp.0bBm22GGlO/poppler-22.10.0/build'
make: *** [all] Error 2

I will investigate this one tomorrow.

balston commented 1 year ago

Patched glib/tests/pdfdrawbb.c to add getarg.h to the list of includes. This time the build has completed without errors.

Next step is to produce a module file.

balston commented 1 year ago

Built on Myriad as well.

balston commented 1 year ago

Module file done and pulled to Myriad. Need to use the following module commands to access it:

module -f unload compilers mpi gcc-libs
module load beta-modules
module load gcc-libs/9.2.0
module load boost/1.75.0/gnu-4.9.2
module load curl/7.86.0/gnu-4.9.2
module load poppler/22.10.0/gnu-9.2.0
balston commented 1 year ago

User informed.

balston commented 1 year ago

Run some simple tests on Myriad. For example:

pdfinfo ./CUDA/samples/NVIDIA_CUDA-11.3_Samples/NVIDIA_CUDA-11.3_Samples/3_Imaging/dct8x8/doc/dct8x8.pdf
Title:           App Note Template
Author:          Anton
Creator:         Microsoft® Word 2010
Producer:        Microsoft® Word 2010
CreationDate:    Tue Sep  3 23:54:34 2013 BST
ModDate:         Tue Sep  3 23:54:34 2013 BST
Custom Metadata: no
Metadata Stream: no
Tagged:          yes
UserProperties:  no
Suspects:        no
Form:            none
JavaScript:      no
Pages:           15
Encrypted:       no
Page size:       612 x 792 pts (letter)
Page rot:        0
File size:       764608 bytes
Optimized:       no
PDF version:     1.5
[ccaabaa@login12 Software]$ which pdfinfo
/shared/ucl/apps/Poppler/22.10.0/gnu-9.2.0/bin/pdfinfo