Grayda / gitlab-doc-builder

A GitLab-CI job to build PDF and DOCX files from markdown
46 stars 10 forks source link

pandoc -> pdf failed #1

Closed depdol closed 3 weeks ago

depdol commented 6 years ago

Hello. I tried to use your yaml file in my project. Pandoc and Kramdown are installed, but the creation of documents is very long and ends with an error. I read the documentation for gitlab - I found out that it's almost impossible to include logs for CI. Can you help? My yaml config is like this:

build_pdf:   before_script:     # Download and install pandoc and kramdown before we begin     # pandoc does PDF, but requires pdflatex, which can be a ~ 500mb download     # so we go for kramdown, which handles PDF, but does not handle DOCX     - yum update -y     - yum install -y pandoc     - gem install kramdown     - gem install prawn     - gem install prawn-table   script:     # Runs pandoc on all .md files in the repo and outputs them as PDF and DOCX     # - find. -name ' .md' -exec sh -c 'pandoc $ 0 -f markdown -t docx -o $ 0.docx' {} \;     - find. -name ' .md' -exec sh -c 'kramdown $ 0 --output pdf> $ 0.pdf' {} \;   artifacts:     # Attach all untracked files (e.g. files that were recently created and not yet committed to git) as artifacts.     # These are the files you then download after the job has finished.     untracked: true   only:     # Only run on the master branch     - master

In the bash all yaml command is running.

Grayda commented 6 years ago

Can you post the output from CI? You should be able to view the job and see the results of the commands being run.

I suspect it's something about your files that are causing it to fail, such as odd characters in your filename, or some markdown that breaks pandoc or kramdown.

MikeDabrowski commented 6 years ago

Mine build also didn't work. Cannot open pdf. Heres the output:

Running with gitlab-runner 10.4.0-rc1 (fb4078b3)
  on docker-auto-scale (72989761)
Using Docker executor with image ruby:2.1 ...
Using docker image sha256:beeca2a61a54d771ddb67190dd90883baa5c2e93aa7e97488803c024231cab56 for predefined container...
Pulling docker image ruby:2.1 ...
Using docker image ruby:2.1 ID=sha256:223d1eaa9523fa64e78f5a92b701c9c11cbc507f0ff62246dbbacdae395ffea3 for build container...
Running on runner-72989761-project-3812105-concurrent-0 via runner-72989761-srm-1516057052-51bd63c6...
Cloning repository...
Cloning into '/builds/MikeDabrowski/what-to-pack'...
Checking out 5263bb3c as master...
Skipping Git submodules setup
$ apt-get update -y
Get:1 http://security.debian.org jessie/updates InRelease [63.1 kB]
Get:2 http://security.debian.org jessie/updates/main amd64 Packages [607 kB]
Ign http://deb.debian.org jessie InRelease
Get:3 http://deb.debian.org jessie-updates InRelease [145 kB]
Get:4 http://deb.debian.org jessie Release.gpg [2434 B]
Get:5 http://deb.debian.org jessie Release [148 kB]
Get:6 http://deb.debian.org jessie-updates/main amd64 Packages [23.1 kB]
Get:7 http://deb.debian.org jessie/main amd64 Packages [9064 kB]
Fetched 10.1 MB in 4s (2491 kB/s)
Reading package lists...
$ apt-get install -y pandoc
Reading package lists...
Building dependency tree...
Reading state information...
The following extra packages will be installed:
  liblua5.1-0 pandoc-data
Suggested packages:
  texlive-latex-recommended texlive-xetex texlive-luatex pandoc-citeproc
  etoolbox
The following NEW packages will be installed:
  liblua5.1-0 pandoc pandoc-data
0 upgraded, 3 newly installed, 0 to remove and 87 not upgraded.
Need to get 4764 kB of archives.
After this operation, 38.9 MB of additional disk space will be used.
Get:1 http://deb.debian.org/debian/ jessie/main liblua5.1-0 amd64 5.1.5-7.1 [108 kB]
Get:2 http://deb.debian.org/debian/ jessie/main pandoc-data all 1.12.4.2~dfsg-1 [202 kB]
Get:3 http://deb.debian.org/debian/ jessie/main pandoc amd64 1.12.4.2~dfsg-1+b14 [4453 kB]
debconf: delaying package configuration, since apt-utils is not installed
Fetched 4764 kB in 0s (5164 kB/s)
Selecting previously unselected package liblua5.1-0:amd64.
(Reading database ... 
(Reading database ... 5%
(Reading database ... 10%
(Reading database ... 15%
(Reading database ... 20%
(Reading database ... 25%
(Reading database ... 30%
(Reading database ... 35%
(Reading database ... 40%
(Reading database ... 45%
(Reading database ... 50%
(Reading database ... 55%
(Reading database ... 60%
(Reading database ... 65%
(Reading database ... 70%
(Reading database ... 75%
(Reading database ... 80%
(Reading database ... 85%
(Reading database ... 90%
(Reading database ... 95%
(Reading database ... 100%
(Reading database ... 21168 files and directories currently installed.)
Preparing to unpack .../liblua5.1-0_5.1.5-7.1_amd64.deb ...
Unpacking liblua5.1-0:amd64 (5.1.5-7.1) ...
Selecting previously unselected package pandoc-data.
Preparing to unpack .../pandoc-data_1.12.4.2~dfsg-1_all.deb ...
Unpacking pandoc-data (1.12.4.2~dfsg-1) ...
Selecting previously unselected package pandoc.
Preparing to unpack .../pandoc_1.12.4.2~dfsg-1+b14_amd64.deb ...
Unpacking pandoc (1.12.4.2~dfsg-1+b14) ...
Setting up liblua5.1-0:amd64 (5.1.5-7.1) ...
Setting up pandoc-data (1.12.4.2~dfsg-1) ...
Setting up pandoc (1.12.4.2~dfsg-1+b14) ...
Processing triggers for libc-bin (2.19-18+deb8u10) ...
$ gem install kramdown
Successfully installed kramdown-1.16.2
1 gem installed
$ gem install prawn
Successfully installed ttfunk-1.5.1
Successfully installed pdf-core-0.7.0
Successfully installed prawn-2.2.2
3 gems installed
$ gem install prawn-table
Successfully installed prawn-table-0.2.2
1 gem installed
$ find . -name '*.md' -exec sh -c 'pandoc $0 -f markdown -t docx -o $0.docx' {} \;
$ find . -name '*.md' -exec sh -c 'kramdown $0 --output pdf > $0.pdf' {} \;
/usr/local/bundle/gems/kramdown-1.16.2/lib/kramdown/parser/base.rb:93:in `adapt_source': The source text contains invalid characters for the used encoding US-ASCII (RuntimeError)
    from /usr/local/bundle/gems/kramdown-1.16.2/lib/kramdown/parser/kramdown.rb:89:in `parse'
    from /usr/local/bundle/gems/kramdown-1.16.2/lib/kramdown/parser/base.rb:69:in `parse'
    from /usr/local/bundle/gems/kramdown-1.16.2/lib/kramdown/document.rb:104:in `initialize'
    from /usr/local/bundle/gems/kramdown-1.16.2/bin/kramdown:82:in `new'
    from /usr/local/bundle/gems/kramdown-1.16.2/bin/kramdown:82:in `<top (required)>'
    from /usr/local/bundle/bin/kramdown:23:in `load'
    from /usr/local/bundle/bin/kramdown:23:in `<main>'
Uploading artifacts...
untracked: found 2 files                           
Uploading artifacts to coordinator... ok            id=48062542 responseStatus=201 Created token=kF7n1soY
Job succeeded
Grayda commented 6 years ago

Looks like Kramdown is having troubles with non-ASCII characters: The source text contains invalid characters for the used encoding US-ASCII (RuntimeError)

My suggestion is, if possible, change all text to ASCII, or remove the Kramdown line from the yaml script, then take the DOCX and make your own PDF out of that. Or you might be able to change the encoding type that Kramdown accepts using the command line. Try googling "kramdown encoding" and see if that nets you anything.

Sorry if that's not entirely helpful, but I've just returned home from an 19+ hour international flight and my mind isn't totally with it at the moment :)

MikeDabrowski commented 6 years ago

Thanks, but I'd rather not use it at all. Got only couple days to write my thesis so I'd rather not waste time on fixing this rn. Also removing non ascii might be troublesome because I use my native language, dont know which characters are ascii and which arent. Aaand Im not entirely happy with output formatting of the file so will try this with next paper.

My mind is also off tracks due to lack of sleep recently so I can relate :D

Nevertheles I really appreciate your idea. Installing whole latex just to occasionaly write some paper is painfull... it always takes at least 3GB of space :(

ayeks commented 6 years ago

To check for UTF8 errors you can add the following stage to the gitlab-ci.yml:

utf8_check:
  # validates all md files if the content is UTF8 compliant
  stage: test
  script:
    - find . -name '*.md' -exec sh -c 'echo $0 && iconv -f UTF-8 $0;' {} \;
    # TODO: throw exit 1 here so that buildpdf will not executed!

Unfortunately it does not return with an error if the validation fails. Maybe you can improve that.