Rapporter / pander

An R Pandoc Writer: Convert arbitrary R objects into markdown
http://rapporter.github.io/pander/
Open Software License 3.0
294 stars 66 forks source link

Weird formatting in grid tables #305

Open jenitivecase opened 7 years ago

jenitivecase commented 7 years ago

There is a 7 that is randomly getting formatted differently from all other values in my table output: image

The table appears totally normal in R Studio and in the LaTeX code: image image

I've reproduced the issue with a toy dataset. In doing this, I determined the issue is connected to the presence of column names somehow.

Here's my code for the test case:

test_table <- as.data.frame(matrix(c(11.23, 2356, 23.4, 2.1, 573, 24.5, 24.7, 35, 2.34, 7.6, 234, 6.9, 4.5, 7, 12.3, 5.6, 4123, 2.3), ncol = 3, byrow = TRUE))
colnames(test_table) <- c("PN\nFail\n(Pct)", "PN\nPass\n(Count)", "PN\nPass\n(Pct)")
rownames(test_table) <- c("Green Light - 100% Complete", "Did not Finish - 75% Complete", 
                          "Did not Finish - 50% Complete", "Did not Finish - 25% Complete", 
                          "Red Light - Rec. Against", "Total")
pander::pandoc.table(test_table, style = "grid", split.tables = Inf, split.cells = 50,
                     keep.line.breaks = TRUE, emphasize.rownames = FALSE,
                     big.mark = ",")

Here's some fun output: image image

And here's a bonus example of output from when I removed newline characters and parentheses from the column names in a troubleshooting attempt:

test_table <- as.data.frame(matrix(c(11.23, 2356, 23.4, 2.1, 573, 24.5, 24.7, 35, 2.34, 7.6, 234, 6.9, 4.5, 7, 12.3, 5.6, 4123, 2.3), ncol = 3, byrow = TRUE))
colnames(test_table) <- c("PN Fail Pct", "PN Pass Count", "PN Pass Pct")
rownames(test_table) <- c("Green Light - 100% Complete", "Did not Finish - 75% Complete", 
                          "Did not Finish - 50% Complete", "Did not Finish - 25% Complete", 
                          "Red Light - Rec. Against", "Total")
pander::pandoc.table(test_table, style = "grid", split.tables = Inf, split.cells = 50,
                     keep.line.breaks = TRUE, emphasize.rownames = FALSE,
                     big.mark = ",")

image

I no longer know what to trust. Any help is appreciated.

jenitivecase commented 7 years ago

P.S. It is not specific to the number 7... image

... but it does seem to be specific to single-digit numbers. image

daroczig commented 7 years ago

Thanks for the report, but I'm not sure if it's a pander issue. I mean pander does the R object to markdown transformation, but the rest (eg markdown -> PDF) is down via pandoc. So we should figure out if it's a problem in the markdown (as per the markdown specs at http://pandoc.org/MANUAL.html) or it's an issue with converting the markdown to PDF.

The markdown looks right to me, but I'm open to other opinions -- can you do some research on this?

jenitivecase commented 7 years ago

Additional things I have tried, with results: removing parentheses from colnames = success commenting parentheses in colnames with a single \ = failure commenting parentheses in colnames with a double escape, \\ = failure, and the part of the colnames enclosed in parens became italicized (? this is a total mystery to me)

From the pandoc manual, I gather than parentheses may be used for section numbers and list markers. This is why I tried using character escapes. The only relevant bit of info I could find was:

The man page writer extracts a title, man page section number, and other header and footer information from the title line. The title is assumed to be the first word on the title line, which may optionally end with a (single-digit) section number in parentheses.

I didn't find anything in the section of the manual on tables. I'm starting to think you're correct that this is a pandoc bug rather than a pander issue. Maybe this issue could be addressed with a caveat in the pander documentation, especially since the pandoc documentation doesn't seem to be especially helpful here.

daroczig commented 7 years ago

Could you try converting the markdown document into tex instead of PDF first? Then checking the tex file would show if it's a pandoc, or maybe PDF rendering issue.

daroczig commented 7 years ago

Meanwhile, I'm closing this ticket, as I suspect the issue to be outside of pander -- but please report back with the above information and let's try to debug if you find something

jenitivecase commented 7 years ago

Okay, I did some tests and figured out what is happening. The bits that are formatted incorrectly are all enclosed in {verbatim} in the tex code. For example, here's the line with the weird 7 from Table 1 of my test output:

\begin{minipage}[t]{0.42\columnwidth}\raggedright\strut
Red Light - Rec. Against\strut
\end{minipage} & \begin{minipage}[t]{0.12\columnwidth}\raggedright\strut
4.5\strut
\end{minipage} & \begin{minipage}[t]{0.15\columnwidth}\raggedright\strut
\begin{verbatim}
7
\end{verbatim}
\strut
\end{minipage} & \begin{minipage}[t]{0.15\columnwidth}\raggedright\strut
12.3\strut
\end{minipage}\tabularnewline
\begin{minipage}[t]{0.42\columnwidth}\raggedright\strut

I've attached the tex and PDF outputs from this test run as well as the Rmd file used to create them.

test_output_pander7.zip

daroczig commented 7 years ago

Wow, good catch! Let me see if there's a fix for this in pandoc or we might want to eg add an extra space in the column in such case to avoid triggering the verbatim env due to the four spaces.

daroczig commented 7 years ago

BTW do you insist on using the grid table format? As per the pandoc docs:

The cells of grid tables may contain arbitrary block elements (multiple paragraphs, code blocks, lists, etc.).

So if you switch to eg the default multiline, this problem should in theory go away.

jenitivecase commented 7 years ago

I wouldn't say I insist on grid, but it was a quick fix to other formatting issues.

To illustrate my point, I reran my test file after changing the type to multiline for all examples. As you can see in the attached file, this output lacks the consistence in width and alignment across tables, even though the arguments in the pandoc.table() function are the same in all cases. This is why I switched to grid. Use of multiline does, however, fix the main issue we've been discussing.

test.pdf

daroczig commented 7 years ago

As per most recent pandoc docs at http://pandoc.org/MANUAL.html#extension-grid_tables, alignment is now specified in grid tables just like in the pipe tables (since version pandoc 1.19) -- so this is a bug in pander.

jenitivecase commented 7 years ago

Sorry it's a bug, but thanks for looking into it!