cleong110 / sign-language-processing.github.io

Documentation and background of sign language processing
0 stars 0 forks source link

Output to PDF in IEEE format #26

Open cleong110 opened 2 weeks ago

cleong110 commented 2 weeks ago

Following on from #25 , let's try and output to a PDF using IEEE journal format.

https://template-selector.ieee.org/secure/templateSelector/publicationType, I picked Journal/IEEE Transactions on Emerging Topics in Computational Intelligence

cleong110 commented 2 weeks ago

IEEE-Transactions-LaTeX2e-templates-and-instructions.zip

Here's the official IEEE instructions/templates

cleong110 commented 2 weeks ago

And here's the Makefile in the project:

markdown: dst dst/index.html dst/style.css

server: dst dst/style.css dst/index.md dst/sitemap.xml

dst/index.html: dst/index.md src/references.bib src/template/index.html dst/style.css
    pandoc dst/index.md --template src/template/index.html -s --table-of-contents --bibliography=src/references.bib --citeproc --columns 1000 -H src/header.html -V lang=en -o $@

dst/index_shortcode.md: dst/index.md
    node addons/emoji-to-shortcode/main.js dst/index.md > $@

dst/index.pdf: dst/index_shortcode.md src/references.bib
    cd dst && pandoc -f markdown+emoji -L../addons/latex-emoji.lua index_shortcode.md -s -N --pdf-engine=lualatex --shift-heading-level-by=-1 --bibliography=../src/references.bib --citeproc -o index.pdf

dst/thesis.pdf: dst/index_shortcode.md src/references.bib
    #pandoc -f markdown+emoji -L addons/latex-emoji.lua src/thesis/main.tex -s -N --pdf-engine=lualatex --shift-heading-level-by=-1 --bibliography=src/references.bib --citeproc -o index.pdf
    cd src/thesis && pandoc main.tex -s -N --natbib --pdf-engine=xelatex -o index.pdf

dst/index.md: src/index.md src/markdown_fix.sh src/formats.md dst tmp/datasets.md dst/assets
    cat src/index.md > $@
    bash src/markdown_fix.sh $@

dst/style.css: dst src/styles/custom.css
    cat src/styles/custom.css > $@

dst/sitemap.xml: dst src/sitemap.js
    node src/sitemap.js > $@

# TODO make this depend on all asset files
dst/assets: src/assets/tasks/tasks.svg
    mkdir -p $@
    cp -r src/assets/* $@

dst:
    mkdir $@

# Temporary files
tmp:
    mkdir $@

# TODO make this depend on all dataset json files
tmp/datasets.md: src/datasets.js tmp
    node src/datasets.js > $@

dst/sections: dst/index.tex
    python src/split_sections.py

overleaf: dst/sections tmp
    rm -rf tmp/overleaf
    git clone https://git.overleaf.com/611a535f64617c334d122e31 tmp/overleaf
    mkdir -p tmp/overleaf/parts/background
    cp -r dst/sections tmp/overleaf/parts/background
    rm -f tmp/overleaf/parts/background/sections/.DS_Store
    cp -r dst/assets tmp/overleaf/parts/background
    cp src/references.bib tmp/overleaf/parts/background
    rm -f tmp/overleaf/parts/background/assets/.DS_Store
    cd tmp/overleaf && \
        git add -A && \
        git commit -am "autoamtic sections upload" && \
        git push

#
#latex: tex/main.tex
#
#tex:
#   mkdir $@
#
dst/index_emoji.tex: dst dst/index_shortcode.md src/references.bib
    pandoc -f markdown+emoji -L addons/latex-emoji.lua dst/index_shortcode.md --shift-heading-level-by=-1 -s -N --natbib -o $@

dst/index.tex: dst/index_emoji.tex src/replace_gifs.py
    python src/replace_gifs.py dst/index_emoji.tex $@

#
#
#tex/references.bib: src/references.bib tex
#   cp src/references.bib $@
cleong110 commented 2 weeks ago

Not really sure which to use, but maybe base it off of the overleaf one?

cleong110 commented 2 weeks ago

OK, now how to run this? Somehow I need Python, I need make, and I need it all to run on Windows. Alternatively give up and do this tomorrow on my Linux workstation. But for now...

# manually replace all `cat` with `type`... no wait get-content? https://superuser.com/questions/434870/what-is-the-windows-equivalent-of-the-unix-command-cat
# run in Anaconda Powershell prompt, so I have both make and Python
conda create -n slp_to_pdf 
conda install Python
conda install -c conda-forge imageio
conda install pip
python -m pip install regex

And it fails with an error. image

cleong110 commented 2 weeks ago

Here's my ieee target

ieee: dst/sections tmp
    rm -rf tmp/overleaf
    git clone https://git.overleaf.com/611a535f64617c334d122e31 tmp/overleaf
    mkdir -p tmp/overleaf/parts/background
    cp -r dst/sections tmp/overleaf/parts/background
    rm -f tmp/overleaf/parts/background/sections/.DS_Store
    cp -r dst/assets tmp/overleaf/parts/background
    cp src/references.bib tmp/overleaf/parts/background
    rm -f tmp/overleaf/parts/background/assets/.DS_Store
cleong110 commented 2 weeks ago

Tried a number of things, keep running into various errors. I think I will need to run it on Linux.

cleong110 commented 2 weeks ago

Giving Colab a try. Had to manually install Pandoc from the .deb in order to get around an error with "unrecognized option" for --citeproc.

Then ran into an issue where it didn't have lualatex. Giving apt install texlive-full a go. Problem is, it takes forever

cleong110 commented 2 weeks ago

It finally worked! I got an index.pdf!

So make dst/index.pdf is what I want to build off of then

cleong110 commented 2 weeks ago

https://pandoc.org/MANUAL.html#citation-rendering-1 is relevant

cleong110 commented 1 week ago

https://github.com/stsewd/ieee-pandoc-template could give some insights into how this can work. Also https://github.com/stsewd/ieee-pandoc-template/blob/master/template.latex in particular might be useful.

cleong110 commented 1 week ago

I note with interest that https://github.com/sign-language-processing/sign-language-processing.github.io/blob/master/.github/workflows/deploy.yml uses a docker container.

https://hub.docker.com/r/pandoc/latex is a possibility that might work.

cleong110 commented 1 week ago

https://github.com/pandoc/dockerfiles

cleong110 commented 1 week ago

Today I am pursuing two avenues:

  1. Install texlive-full, pandoc, etc on my Linux Laptop.
  2. pandock, aka docker container for pandoc/extra which should have everything needed. Fonts, etc.
cleong110 commented 1 week ago

BUILDING NATIVELY WITH TEXLIVE-FULL, PANDOC ON UBUNTU

Option 1, installing texlive-full, is taking a VERY long time, it has been on this step for hours: image

Meanwhile I'm pursuing pandock (see below)

After several hours, I finally Googled it, finding this: https://askubuntu.com/questions/956006/pregenerating-context-markiv-format-this-may-take-some-time-takes-forever

Turns out you need to press enter???

Yup that fixed it!

image

What's up with this?

[WARNING] [makePDF] LaTeX Warning: Citation `yin-etal-2021-including' on page 5
  undefined on input line 460.
cleong110 commented 1 week ago

BUILDING WITH PANDOCK (stalled)

OK, so we're starting with this as a basis

docker run --rm \
       --volume "$(pwd):/data" \
       --user $(id -u):$(id -g) \
       pandoc/extra README.md -o outfile.epub

And trying to combine that with

dst/index.pdf: dst/index_shortcode.md src/references.bib
    cd dst && pandoc -f markdown+emoji -L../addons/latex-emoji.lua index_shortcode.md -s -N --pdf-engine=lualatex --shift-heading-level-by=-1 --bibliography=../src/references.bib --citeproc -o index.pdf

Let's just try replacing pandoc with docker run --rm --volume "$(pwd):/data" --user $(id -u):$(id -g) pandoc/extra first, replicating the original result

dst/index_pandock.pdf: dst/index_shortcode.md src/references.bib
    cd dst && docker run --rm --volume "$(pwd):/data" --user $(id -u):$(id -g) pandoc/extra -f markdown+emoji -L../addons/latex-emoji.lua index_shortcode.md -s -N --pdf-engine=lualatex --shift-heading-level-by=-1 --bibliography=../src/references.bib --citeproc -o index.pdf

OK, somehow "$(pwd)" is blank. I believe that's because Make is interpreting this as trying to get a variable, it's not running the shell.

Using $(shell pwd) instead...

dst/index_pandock.pdf: dst/index_shortcode.md src/references.bib
    cd dst && docker run --rm --volume "$(shell pwd):/data" --user $(shell id -u):$(shell id -g) pandoc/extra -f markdown+emoji -L../addons/latex-emoji.lua index_shortcode.md -s -N --pdf-engine=lualatex --shift-heading-level-by=-1 --bibliography=../src/references.bib --citeproc -o index.pdf

OK, getting some kind of error about index_shortcode.md not existing? image

pandoc: index_shortcode.md: withBinaryFile: does not exist (No such file or directory)

Let's log into the container:

# in dst folder
# docker flags: remove afterwards, interactive, mount volume, user, entrypoint (no bash installed)
$ docker run --rm -it --volume "$(pwd):/data" --user $(id -u):$(id -g) --entrypoint /bin/sh pandoc/extra
/data $ ls
assets              index.html          index.md            index.pdf           index_shortcode.md  style.css

Edited the paths to use absolute paths inside the Docker

dst/index_pandock.pdf: dst/index_shortcode.md src/references.bib
    docker run --rm --volume "$(shell pwd):/data" --user $(shell id -u):$(shell id -g) pandoc/extra -f markdown+emoji -L/data/addons/latex-emoji.lua /data/dst/index_shortcode.md -s -N --pdf-engine=lualatex --shift-heading-level-by=-1 --bibliography=/data/src/references.bib --citeproc -o index.pdf

That made progress, now we're seeing this: ! LaTeX Error: File 'pdflscape.sty' not found. image

It seems it's not installed! image

Making a Dockerfile? https://github.com/pandoc/dockerfiles/issues/135 says to use tlmgr image OK so it exists.

Dockerfile

FROM pandoc/extra

RUN tlmgr install pdflscape

And then edit the Makefile

# set up custom docker image based on https://hub.docker.com/r/pandoc/extra
docker:
    cd docker && docker build . -t pandoc/extra/slp

dst/index_pandock.pdf: dst/index_shortcode.md src/references.bib docker
    docker run --rm --volume "$(shell pwd):/data" --user $(shell id -u):$(shell id -g) pandoc/extra/slp -f markdown+emoji -L/data/addons/latex-emoji.lua /data/dst/index_shortcode.md -s -N --pdf-engine=lualatex --shift-heading-level-by=-1 --bibliography=/data/src/references.bib --citeproc -o index.pdf

gives us this

image

Perhaps referring to this? https://ctan.org/tex-archive/fonts/twemoji-colr

in which case we can add a tlmgr command to install it in the Dockerfile

Yup that worked. Now we get

! Paragraph ended before \Gin@iii was complete.

Which, according to https://tex.stackexchange.com/questions/37650/paragraph-ended-before-giniii-was-complete-while-inserting-image-with-inclu means we need to add

\usepackage{graphicx}

Added that to the top of index.md, now we get image

! Package luatex.def Error: File `assets/representation/continuous.pdf' not fou
nd: using draft setting.

Which I fixed by ctrl+r in index.md, from "assets" to "src/assets", but then the normal make dst/index.pdf doesn't work.

Let's try and get in? https://stackoverflow.com/questions/45356985/how-to-run-an-existing-stopped-container-and-get-inside-the-bash

cleong110 commented 1 week ago

You can't use \cite{} in raw latex with pandoc

Giving up on "pandock" in favor of natively running, we are left with this puzzler

[WARNING] [makePDF] LaTeX Warning: Citation `yin-etal-2021-including' on page 5
  undefined on input line 460.

I believe that originates from this part of index.md image

When I edit this, the error message changes.

I have not got the foggiest idea how to fix this. This is latex in markdown, interpreted somehow by pandoc. Is it a mistake in the latex? in the pandoc? in the markdown?

Searching for "{=latex}" on Google... Apparently this is syntax that signals to Pandoc to leave this as raw LaTeX. https://bookdown.org/yihui/rmarkdown-cookbook/raw-latex.html

https://stackoverflow.com/questions/14288699/pandoc-not-converting-latex-style-citations-correctly suggests that Pandoc simply cannot do the citations in the raw.

https://tex.stackexchange.com/questions/686867/reference-citation-in-latex-block-in-pandoc is close to what we're experiencing

https://tex.stackexchange.com/questions/276351/how-to-use-pandoc-citeproc-in-raw-latex-block-of-a-markdown-document?rq=1 is exactly our situation: "How to use pandoc-citeproc in raw latex block of a markdown document?"

Pandoc with --citeproc ignores \cite commands is also helpful. tl;dr this just doesn't work it seems. No good way.

cleong110 commented 1 week ago

Missing character: There is no ŷ (U+0177) (U+0177) in font lmmi10!

The culprit is here image

https://www.physicsread.com/latex-hat-symbol/ The proper way to typeset this is apparently with \hat{}

cleong110 commented 1 week ago

Output in IEEE Format

Can I simply pass in an IEEE template?

No, if I pass in the template from IEEE it crashes, of course. That one is designed for people.

Can I find an existing pandoc IEEE template?

https://github.com/stsewd/ieee-pandoc-template could give some insights into how this can work. Also https://github.com/stsewd/ieee-pandoc-template/blob/master/template.latex in particular might be useful.

AmitMY commented 1 week ago

hmmmm what is unclear to me - are you trying to use pandoc to export a pdf directly, or a tex file? if it is a tex file, it doesn't matter if it is a raw block or not (and then use pdflatex command to create a pdf)

cleong110 commented 1 week ago

hmmmm what is unclear to me - are you trying to use pandoc to export a pdf directly, or a tex file? if it is a tex file, it doesn't matter if it is a raw block or not (and then use pdflatex command to create a pdf)

The "undefined citation" issue occurs with make dst/index.pdf. here

Here's the command

dst/index.pdf: dst/index_shortcode.md src/references.bib
    cd dst && pandoc -f markdown+emoji -L../addons/latex-emoji.lua index_shortcode.md -s -N --pdf-engine=lualatex --shift-heading-level-by=-1 --bibliography=../src/references.bib --citeproc -o index.pdf

Result: image

Actually IEEEwants a .tex anyway, so I am planning to shift to that.

cleong110 commented 1 week ago

Output to ieee format .tex file

Rather than editing the index.pdf target to produce an ieee version directly, I'm going to try outputting the website in a .tex format.

I am referencing the style guidelines from this journal https://cis.ieee.org/publications/t-emerging-topics-in-ci/tetci-manuscript-format

make dst/index.tex

Can we replicate this target?

error on replace_gifs.py: No module imageio

 make dst/index.tex 
pandoc -f markdown+emoji -L addons/latex-emoji.lua dst/index_shortcode.md --shift-heading-level-by=-1 -s -N --natbib -o dst/index_emoji.tex
latex-emoji: bxcoloremoji = false
latex-emoji: emojifont = nil
latex-emoji: emoji character: U+1F3A5

... various outputs omitted ...

latex-emoji: prologue successfully inserted
python src/replace_gifs.py dst/index_emoji.tex dst/index.tex
Traceback (most recent call last):
  File "src/replace_gifs.py", line 4, in <module>
    import imageio
ModuleNotFoundError: No module named 'imageio'
make: *** [Makefile:91: dst/index.tex] Error 1

I'm going to start a requirements.txt and add imageio

Then make dst/index.tex works!

compile index.tex to PDF

pdflatex index.tex

This fails, with the following error:

! LaTeX Error: You must install a new TeX system (TeX Live 2020)
               and then use 'lualatex' engine to print emoji.

lualatex index.tex

This runs, but then gives many warnings about undefined references.

Let's try copying in the references.bib to the same folder... no good. still undefined references.

Here's what my makefile targets look like at this point

dst/index.tex: dst/index_emoji.tex src/replace_gifs.py
    python src/replace_gifs.py dst/index_emoji.tex $@

dst/index_tex.pdf: dst/index.tex
    cp src/references.bib dst 
    cd dst && lualatex index.tex

clean:
    rm -r dst
    rm -r tmp

Package natbib Warning: Citation `glickman2018language' on page 1 undefined

How do I fix this?

compile multiple times?

Apparently you need to compile multiple times? Really? Bibtex, Latex compiling

Let's try:

dst/index_tex.pdf: dst/index.tex
    cp src/references.bib dst 
    cd dst && lualatex index.tex
    cd dst && bibtex index.aux
    cd dst && lualatex index.tex
    cd dst && lualatex index.tex
I found no \bibdata command---while reading file index.aux

Nope! image

A few results I found: No \citation, \bibdata or \bibstyle command [duplicate] recommends adding a usepackage command for biblatex, and also using biber instead of bibtex.

LaTeX Warning: Citation undefined mentions adding


\bibliographystyle{IEEEtran}
\bibliography{ref}

Tried manually editing the index.tex to add in \usepackage[backend=biber]{biblatex} instead of the usepackage for natbib, and also add \bibliographystyle{IEEEtran} \bibliography{references} at the end, and then run

dst/index_tex.pdf:
    cp src/references.bib dst 
    cd dst && lualatex index.tex
    cd dst && biber index.aux
    cd dst && lualatex index.tex
    cd dst && lualatex index.tex

at the end, to no avail. Gave me some sort of error about ".bbl' not created by bib latex"

OK, so I now have \usepackage[backend=biber,natbib=true]{biblatex} at the top of the .tex file, and am compiling it with

dst/index_tex.pdf:
    cp src/references.bib dst 
    cd dst && rm index.*
    cd dst && cat bak_index.tex > index.tex
    cd dst && lualatex index.tex
    cd dst && biber index.aux
    cd dst && lualatex index.tex
    cd dst && lualatex index.tex

And I get

Use of uninitialized value in quotemeta at /usr/share/perl5/Biber/Config.pm line 228.
Use of uninitialized value $tool in concatenation (.) or string at /usr/share/perl5/Biber/Config.pm line 307.
INFO - This is Biber 2.17
INFO - Logfile is 'index.aux.blg'
ERROR - Cannot find 'index.aux.bcf'!
INFO - ERRORS: 1
make: *** [Makefile:98: dst/index_tex.pdf] Error 2
AmitMY commented 1 week ago

I think converting the md to tex would be the correct approach. The HTML might convert the references poorly

cleong110 commented 1 week ago

I think converting the md to tex would be the correct approach. The HTML might convert the references poorly

Based on what I read, getting pandoc to properly deal with the raw latex in the .md is actually not possible at all, or at least there's "no good way", so yeah the .md->.tex->.pdf path seems more promising.

cleong110 commented 1 week ago

Got badly stuck trying to get the references to stop being undefined. I have a quite complicated .tex file, not sure how to get it to compile with references.bib. Tried various things (see above). Going to work on this more tomorrow.

AmitMY commented 6 days ago

you can always do it in steps. make a PR with what you have etc, then i could see if i can help on what you don't

cleong110 commented 6 days ago

I would really appreciate that. Made a draft PR.

cleong110 commented 3 days ago

OK, after a bit of cleanup, taking stock:

I am trying to get proper .tex output working.

Original makefile had this target, which creates index.tex

dst/index.tex: dst/index_emoji.tex src/replace_gifs.py
    python src/replace_gifs.py dst/index_emoji.tex $@

I am now trying to compile that to pdf.

dst/index_tex.pdf: dst/index.tex
    cp src/references.bib dst 
    cd dst && lualatex index.tex
    cd dst && biber index.aux
    cd dst && lualatex index.tex
    cd dst && lualatex index.tex

This gives me many many "undefined citation" errors.

Running it again to be sure... Actually no wait, now I'm getting "undefined control sequence", that's new.

I get a lot of output, it starts with image

and ends with

! Undefined control sequence.
l.3 \abx@aux@refcontext
                     {nty/global//global/global}

Edit: https://tex.stackexchange.com/a/328893 says it's due to having changed things and then recompiling. AKA I don't have a fresh dst folder. time to make clean and try again.

OK, cleaned and reran, and I get the expected errors. For example:

Package natbib Warning: Citation `patrie2011fingerspelled' on page 21 undefined
 on input line 1666.
Package natbib Warning: Citation `dataset:ebling2018smile' on page 28 undefined
 on input line 2374.

If I add \bibliography{references} to the end, it gets deleted, because index.tex gets recreated earlier.

cleong110 commented 3 days ago

OK, went upstream to latex.md and added it there:

## References
```{=latex}
\bibliography{references}

Also had to fix a rogue ampersand in the .bib file: `Pose & Gesture`, which caused ["Misplaced alignment tab character &" error when citing a particular entry](https://tex.stackexchange.com/questions/174030/misplaced-alignment-tab-character-error-when-citing-a-particular-entry)

Then this finally creates a beautiful PDF:

dst/index_tex.pdf: dst/index.tex cp src/references.bib dst cd dst && lualatex index cd dst && bibtex index cd dst && lualatex index cd dst && lualatex index

cleong110 commented 3 days ago

Perhaps there's a better method? Not sure if adding in a raw latex \bibliography at the end will interfere with the normal website

cleong110 commented 3 days ago

A quick test with

make
npm i -g http-server
http-server dst

seems like adding it caused no issues with normal operation of the website, so that's good.

cleong110 commented 3 days ago

Now that's out of the way,

how can we control what the output looks like? We want it to look like these:

bare_jrnl_new_sample4.pdf New_IEEEtran_how-to.pdf

But there's quite a few differences, just looking at the header of their example .tex:

\documentclass[lettersize,journal]{IEEEtran}
\usepackage{amsmath,amsfonts}
\usepackage{algorithmic}
\usepackage{algorithm}
\usepackage{array}
\usepackage[caption=false,font=normalsize,labelfont=sf,textfont=sf]{subfig}
\usepackage{textcomp}
\usepackage{stfloats}
\usepackage{url}
\usepackage{verbatim}
\usepackage{graphicx}
\usepackage{cite}
\hyphenation{op-tical net-works semi-conduc-tor IEEE-Xplore}
% updated with editorial comments 8/9/2021

\begin{document}

Do I need to add these into index.md somehow?

For starters, let's see if we can get the documentclass to change. Currently our index.tex has

\documentclass[
]{article}

at the top.

cleong110 commented 3 days ago

Hang on, wait a sec.

Reading the pandoc manual I saw this section:

https://pandoc.org/MANUAL.html#citation-rendering

And it turns out that index_emoji.tex depended on references.bib but never actually used it

dst/index_emoji.tex
    pandoc -f markdown+emoji -L addons/latex-emoji.lua dst/index_shortcode.md --shift-heading-level-by=-1 -s -N --natbib -o $@

So if I add it (and remove the raw latex from the index.md)

dst/index_emoji.tex
    pandoc -f markdown+emoji -L addons/latex-emoji.lua dst/index_shortcode.md --shift-heading-level-by=-1 -s -N --natbib --bibliography=references.bib -o $@

then the citations work.

Well, out of time for today. Tomorrow I guess I'll read through the pandoc manual some more and see if I can control things like the documentclass

See also https://github.com/stsewd/ieee-pandoc-template/blob/e4e7f1aa6bf1aa072bc100fe6167d8932a1ea097/makefile

cleong110 commented 1 day ago

Since I've got some time, today I'm going to try and understand, not just "get something working".

IEEE

Overleaf/IEEE

IEEE, it turns out, has a page here, providing author support and official templates that work with Overleaf.

LaTeX quick guide: https://www.overleaf.com/latex/templates/a-quick-guide-to-latex/fghqpfgnxggz.pdf

IEEEtran.cls version: 1.8b

Apparently this is critical somehow. Gotta use this. https://www.ctan.org/pkg/ieeetran is the official download source.

IEEE LaTeX analyzer

https://latexqc.ieee.org/ lets you upload a .zip file and it'll check it.

Pandoc

https://bookdown.org/yihui/rmarkdown-cookbook/latex-variables.html talks about how to set various variables including documentclass

dst/index_emoji.tex

Existing target, which I will presumably need to modify to get it to output to ieee-format .tex

pandoc -f markdown+emoji -L addons/latex-emoji.lua dst/index_shortcode.md --shift-heading-level-by=-1 -s -N --natbib --bibliography=references.bib -o $@

What is this even doing? Let's investigate:

stsewd pandoc

This project has a makefile with a pandoc command, and it can output to IEEE formatted. How do they do it? I added a new target to output to .pdf for testing. My findings:

So what should we do?

I think for starters, we should make our own template. There's a lot of things in this one that are not necessary. We would also like to make sure the list of packages matches what IEEE wants, etc.

Probably the easiest path forward will be to edit stsewd's template.latex and makefile until it's outputting as we desire. THEN try to get our index.md content into $body$.

Alternatively, we can start with one of the template .tex files provided by IEEE and basically delete a bunch of sections, and put $body$ there.