C2DH / journal-of-digital-history-backend

backend for our Journal of Digital History
MIT License
4 stars 0 forks source link

Give the possibility to generate the pdf version for non-published article #209

Closed eliselavy closed 6 months ago

eliselavy commented 10 months ago
eliselavy commented 10 months ago

Until now, no possibility to use the possibilty of pdf generation proposed by:

due to the use of cite2c

Idea to Use citeproc-py to transform the citation in the notebook and after run the nbconvert

Based on the method used for the De Gruyter pdf generation: https://github.com/C2DH/journal-of-digital-history-backend/blob/develop/jdhseo/utils.py#L53

eliselavy commented 9 months ago

Notebook with cite2c markdown generated via celery task Now need to integrate the pdf generation:

jupyter nbconvert --to pdf MyNotebook.ipynb --TagRemovePreprocessor.remove_input_tags remove_input

By tagging with remove_input, not input cell rendered:

Image

eliselavy commented 9 months ago

Need to get visible the hermeneutics paragraph

eliselavy commented 9 months ago

Problem deployment in development

Screenshot 2023-12-04 at 16 10 56
eliselavy commented 9 months ago
celery_1              |     logger.error("Command output:\n", e.output)
celery_1              | [2023-12-05 10:18:22,888: WARNING/ForkPoolWorker-2] Message: 'Command output:\n'
celery_1              | Arguments: ('[NbConvertApp] Converting notebook notebook_with_ref.ipynb to pdf\n[NbConvertApp] ERROR | Error while converting \'notebook_with_ref.ipynb\'\nTraceback (most recent call last):\n  File "/usr/local/lib/python3.8/site-packages/nbconvert/nbconvertapp.py", line 435, in export_single_notebook\n    output, resources = self.exporter.from_filename(notebook_filename, resources=resources)\n  File "/usr/local/lib/python3.8/site-packages/nbconvert/exporters/exporter.py", line 190, in from_filename\n    return self.from_file(f, resources=resources, **kw)\n  File "/usr/local/lib/python3.8/site-packages/nbconvert/exporters/exporter.py", line 208, in from_file\n    return self.from_notebook_node(nbformat.read(file_stream, as_version=4), resources=resources, **kw)\n  File "/usr/local/lib/python3.8/site-packages/nbconvert/exporters/pdf.py", line 168, in from_notebook_node\n    latex, resources = super().from_notebook_node(\n  File "/usr/local/lib/python3.8/site-packages/nbconvert/exporters/latex.py", line 72, in from_notebook_node\n    return super().from_notebook_node(nb, resources, **kw)\n  File "/usr/local/lib/python3.8/site-packages/nbconvert/exporters/templateexporter.py", line 392, in from_notebook_node\n    output = self.template.render(nb=nb_copy, resources=resources)\n  File "/usr/local/lib/python3.8/site-packages/jinja2/environment.py", line 1291, in render\n    self.environment.handle_exception()\n  File "/usr/local/lib/python3.8/site-packages/jinja2/environment.py", line 925, in handle_exception\n    raise rewrite_traceback_stack(source=source)\n  File "/usr/local/share/jupyter/nbconvert/templates/latex/index.tex.j2", line 8, in top-level template code\n    ((* extends cell_style *))\n  File "/usr/local/share/jupyter/nbconvert/templates/latex/style_jupyter.tex.j2", line 176, in top-level template code\n    \\prompt{(((prompt)))}{(((prompt_color)))}{(((execution_count)))}{(((extra_space)))}\n  File "/usr/local/share/jupyter/nbconvert/templates/latex/base.tex.j2", line 7, in top-level template code\n    ((*- extends \'document_contents.tex.j2\' -*))\n  File "/usr/local/share/jupyter/nbconvert/templates/latex/document_contents.tex.j2", line 51, in top-level template code\n    ((*- block figure scoped -*))\n  File "/usr/local/share/jupyter/nbconvert/templates/latex/display_priority.j2", line 5, in top-level template code\n    ((*- extends \'null.j2\' -*))\n  File "/usr/local/share/jupyter/nbconvert/templates/latex/null.j2", line 30, in top-level template code\n    ((*- block body -*))\n  File "/usr/local/share/jupyter/nbconvert/templates/latex/base.tex.j2", line 215, in block \'body\'\n    ((( super() )))\n  File "/usr/local/share/jupyter/nbconvert/templates/latex/null.j2", line 32, in block \'body\'\n    ((*- block any_cell scoped -*))\n  File "/usr/local/share/jupyter/nbconvert/templates/latex/null.j2", line 85, in block \'any_cell\'\n    ((*- block markdowncell scoped-*)) ((*- endblock markdowncell -*))\n  File "/usr/local/share/jupyter/nbconvert/templates/latex/document_contents.tex.j2", line 68, in block \'markdowncell\'\n    ((( cell.source | citation2latex | strip_files_prefix | convert_pandoc(\'markdown+tex_math_double_backslash\', \'json\',extra_args=[]) | resolve_references | convert_pandoc(\'json\',\'latex\'))))\n  File "/usr/local/lib/python3.8/site-packages/nbconvert/filters/pandoc.py", line 24, in convert_pandoc\n    return pandoc(source, from_format, to_format, extra_args=extra_args)\n  File "/usr/local/lib/python3.8/site-packages/nbconvert/utils/pandoc.py", line 52, in pandoc\n    check_pandoc_version()\n  File "/usr/local/lib/python3.8/site-packages/nbconvert/utils/pandoc.py", line 100, in check_pandoc_version\n    v = get_pandoc_version()\n  File "/usr/local/lib/python3.8/site-packages/nbconvert/utils/pandoc.py", line 77, in get_pandoc_version\n    raise PandocMissing()\nnbconvert.utils.pandoc.PandocMissing: Pandoc wasn\'t found.\nPlease check that pandoc is installed:\nhttps://pandoc.org/installing.html\n',)
eliselavy commented 9 months ago

Problem to install pandoc https://github.com/C2DH/journal-of-digital-history-backend/actions/runs/7099856408/job/19324737062

eliselavy commented 9 months ago

In the notebook:

Screenshot 2023-12-13 at 17 34 48

But in the pdf:

Screenshot 2023-12-13 at 17 34 36
eliselavy commented 8 months ago

Same look and feel:

Screenshot 2023-12-14 at 13 11 33

Works:

{
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "hermeneutics\n",
    "\n",
    "## Introduction\n",
    "\n",
    "end hermeneutics"
   ]
  },

Doesn't work:

"source": [
    "hermeneutics remove line\n",
    "## Introduction\n",
    "end hermeneutics remove line"
   ]
eliselavy commented 8 months ago
eliselavy commented 8 months ago

Problem Latex take into account text/plain, dataframe render as: <pandas.io.formats.style.Styler at 0x11ac2e150>

 {
     "data": {
      "text/html": [
       "<style  type=\"text/css\" >\n",
       "</style><table id=\"T_2b14b_\" ><caption>table 1: Some figures and their mentions in the Capuchin Annual between 1930 and 1965</caption><thead>    <tr>        <th class=\"col_heading level0 col0\" >HenryVIII</th>        <th class=\"col_heading level0 col1\" >Victoria</th>        <th class=\"col_heading level0 col2\" >WilliamOrange</th>        <th class=\"col_heading level0 col3\" >FatherMathew</th>        <th class=\"col_heading level0 col4\" >Parnell</th>        <th class=\"col_heading level0 col5\" >WolfeTone</th>        <th class=\"col_heading level0 col6\" >ElizabethI</th>        <th class=\"col_heading level0 col7\" >Cromwell</th>    </tr></thead><tbody>\n",
       "                <tr>\n",
       "                                <td id=\"T_2b14b_row0_col0\" class=\"data row0 col0\" >19</td>\n",
       "                        <td id=\"T_2b14b_row0_col1\" class=\"data row0 col1\" >20</td>\n",
       "                        <td id=\"T_2b14b_row0_col2\" class=\"data row0 col2\" >25</td>\n",
       "                        <td id=\"T_2b14b_row0_col3\" class=\"data row0 col3\" >37</td>\n",
       "                        <td id=\"T_2b14b_row0_col4\" class=\"data row0 col4\" >38</td>\n",
       "                        <td id=\"T_2b14b_row0_col5\" class=\"data row0 col5\" >45</td>\n",
       "                        <td id=\"T_2b14b_row0_col6\" class=\"data row0 col6\" >49</td>\n",
       "                        <td id=\"T_2b14b_row0_col7\" class=\"data row0 col7\" >67</td>\n",
       "            </tr>\n",
       "    </tbody></table>"
      ],
      "text/plain": [
       "<pandas.io.formats.style.Styler at 0x11ac2e150>"
      ]
     },
eliselavy commented 8 months ago

Chinese caracters not displayed for this article: http://10.240.4.179/en/article/fwpktfFtn5jm Image

Image

eliselavy commented 8 months ago

In order to support the Chinese caracters: Need to install the ctex package : tlmgr install ctex

tlmgr: Local TeX Live (2020) is older than remote repository (2023) pdflatex --version (jdh) pdfTeX 3.14159265-2.6-1.40.21 (TeX Live 2020)

eliselavy commented 8 months ago

Chinese caracters ok in Latex But the template of nbconvert doens't support it: notebook -(nbconvert)-> latex -(XeLatex)-> PDF

Include the font and the package in : base.tex.j2

Template are defined here: /Users/elisabeth.guerard/.pyenv/versions/anaconda3-2020.02/share/jupyter/nbconvert/templates/latex

\usepackage{ctex}

Don't know if i need to change in the jupyter_nbconvert_config.py

## Shell command used to compile latex.
#  Default: ['xelatex', '{filename}', '-quiet']
c.PDFExporter.latex_command = ['/usr/local/texlive/2023/bin/universal-darwin/xelatex', '{filename}', '-quiet']
eliselavy commented 8 months ago

Workaround for the moment:

Generate the .ipynb with the citation inside by Celery task

(base) environment where pandoc is installed

Outside of VSCode

eliselavy commented 8 months ago

About the installation of pandoc:

Try to use pandoc docker image https://github.com/pandoc/dockerfiles

Problem with the psycopg2-binary==2.8.6


49.14 Failed to build lxml psycopg2-binary
49.14 ERROR: Could not build wheels for lxml, psycopg2-binary, which is required to install pyproject.toml-based projects
------

See here: https://github.com/C2DH/journal-of-digital-history-backend/actions/runs/7103988496/job/19337950336

eliselavy commented 8 months ago

Try first to not use this binary: https://www.psycopg.org/docs/install.html

eliselavy commented 7 months ago

And for the moment generate it: ON DEMAND by waiting for working on the integration of pandoc image