benwbrum / fromthepage

FromThePage is a wiki-like application for crowdsourcing transcription of handwritten documents.
http://fromthepage.com
GNU Affero General Public License v3.0
170 stars 51 forks source link

latex output is too large #297

Closed saracarl closed 8 years ago

saracarl commented 8 years ago

Here are two pages that I am using as a sandbox for testing LaTeX functions:

http://beta.fromthepage.com/display/display_page?page_id=3804 http://beta.fromthepage.com/display/display_page?page_id=3805

You can see that on the first of the two pages (20), that the formula is rendered in a very large font. On the second page (21), I've copied the example Ben provided for embedding a TeX expression. Once again the font is large. As such, this large size of font appears to be the default for all TeX expressions. Below that TeX expression, I've embedded formula with a manual command to render it in a very small font (5 pt). Despite the small size I've requested, the formula is in a font that is still larger than the default font for regular text.

My first preference would be to have a default font size for formulas that is the same as the default font size for regular text--or something that is roughly similar in size. Then, we can size up or down from there.

Several weeks ago, I was under the impression that the large size formulas ran off the screen to the right. Now I see a scroll bar at the bottom of the page which makes it possible to see the entire formula by moving things around.

Let me know what you recommend to fix the font size issue.

Over the last few days, I've spent a number of hours going through Ben's instructions for embedding formulas, and then I've gone through some introductory user guides for LaTeX. As such, I am now having better luck getting a higher percentage of attempts to render properly. I see now that it is important to refresh the page several times in order to get FromThePage to convert the LaTeX to formulas upon saving. In some cases, I am finding that it is tricky to figure out where to insert spaces, \, {, (,[ and other syntax in order to get the system to convert the commands into formulas. If you have more guidelines to share, let me know. Otherwise, I'll need to figure out what works and what doesn't by trial and error.

saracarl commented 8 years ago

quick and dirty solution: change output size

right solution: pdflatex can produce svgs

saracarl commented 8 years ago

During the request life-cycle, app/controllers/transcribe_controller saves a page. It may or may not manipulate some stuff. app/models/XmlSourceProcessor is included in page.rb and is the module that handles all of the parsing of transcripts in it, wiki_to_xml does most of the control flow process_latex_snippets handles the LaTeX That has two outputs -- the actual text to be saved in the fields (returned in the return value and passed down to other text processors by wiki_to_xml And a side-effect of an array of tex_figures on 'self' (i.e. the page object) So that's what the parser does at parse time. Back within the request lifecycle, if the parser doesn't barf (i.e. send a user error because of bad mark-up) then the page itself is saved.

a Page has a 1:many relationship with TexFigures: cf https://github.com/benwbrum/fromthepage/blob/master/app/models/page.rb#L25

and after the page itself is saved (the record is created in the DB), a call-back is run to update the tex figures: https://github.com/benwbrum/fromthepage/blob/master/app/models/page.rb#L29

That just checks to see if the figure is different than before and saves it: https://github.com/benwbrum/fromthepage/blob/master/app/models/page.rb#L198-L204

(NB all this is still within the browser request where a transcriber pushed the "Save" button)

tex_figure does all the conversion stuff -- it's the location where we itneract with LaTeX, whether on save, on display, or on background transformation: https://github.com/benwbrum/fromthepage/blob/master/app/models/tex_figure.rb

And has it's own logs: ./public/images/working/tex/8081/process.log ./public/images/working/tex/4633/process.log

Okay, so at that point, the database is all prepped -- there's a page record which may have multiple tex_figure child records. The controller has decided that there were no errors, so it fires off the background processes required to post-process the page artifacts: https://github.com/benwbrum/fromthepage/blob/89e3f9ba7031c5189d7c7ccf712e59ff4503ac41/app/controllers/transcribe_controller.rb#L49 That just calls the background process for the tex_figure https://github.com/benwbrum/fromthepage/blob/585d1cdcf868d3645a55025c2a0374f5edf27f41/app/models/page.rb#L195 and that fires off a rake task: https://github.com/benwbrum/fromthepage/blob/585d1cdcf868d3645a55025c2a0374f5edf27f41/app/models/tex_figure.rb#L28 Now the transcriber sees a success message or whatever the appropriate next steps is during a transcript save. End of transcriber request cycle.

saracarl commented 8 years ago

General approach for solution:

replace this bit of code:

convert_command = "convert -density 300 #{cropped_pdf_file_path} #{artifact_file_path}"
logger.info(convert_command)
system(convert_command)

With a call to pdf2svg that's based on this article: http://tex.stackexchange.com/questions/51757/how-can-i-use-tikz-to-make-standalone-svg-graphics next steps: 1) install pdf2svg 2) figure out a test case/input 3) run a first test with old code 4) modify with the pdf2svg call 5) test with new code 6) compare results, play with svg to make sure it's sufficient, figure out how to display the svg so it's sized to the page appropriately.

saracarl commented 8 years ago

To display:

In addition to creating the tex_figure database records, the parser has created a XML tag stored on the page. When a reader looks at the page, we need to render that as HTML. That's done in https://github.com/benwbrum/fromthepage/blob/master/app/helpers/abstract_xml_helper.rb#L63-L73

That code presumes that the tex_figure has generated something to be imbedded in an image tag, and converts the texFigure element in the XML to an HTML img tag. So that's the third element -- the reader's request

saracarl commented 8 years ago

Have the code that will create an svg when latex is saved. The existing UI code will load and display the svg, but it is slow. Here's how it is doing it:

<div class="page-preview">
  <p>an analysis as any, and is the only satisfactory analysis<br/>
   that I have seen, except one substantially the same<br/>
   given by myself. In existential graphs, we have the<br/>
   following signs: <br/>
   <img src='/images/working/tex/8081/figure_1.svg?timestamp=1462464754'/></p>

  <p> <img src='/images/working/tex/8081/figure_2.svg?timestamp=1462464754'/></p>

  <p> <img src='/images/working/tex/8081/figure_3.svg?timestamp=1462464754'/></p>

  <p> <img src='/images/working/tex/8081/figure_4.svg?timestamp=1462464754'/></p>

  <p> <img src='/images/working/tex/8081/figure_5.svg?timestamp=1462464754'/></p>

In other places in the UI code we use the svg tag, like such:

I'm wondering if we should create a or that would specify how we display these particular inline formula svgs.

@kolking Could you weigh in on this? You don't need to read the whole issue, but we're trying to display svgs (~12-28 kb on my test data) in-line with transcribed text. It would be nice if they loaded faster, and if they resized with the display transcription window. (Something along these lines?? .SvgImage img{ width:80%; } )

Code isn't checked in yet, but I'll put it in ui-design and update the issue when it is.