biostars / biostar-handbook

Issue tracker for the Biostar Handbook
57 stars 12 forks source link

Lengthy code lines are overflowing in PDF #87

Open ilaydabozan opened 5 years ago

ilaydabozan commented 5 years ago

Hello,

I would like to point out that on page 689 there are lines longer than the page itself, it is not moving to the next line, instead overflows to the out of the page.

I just downloaded the version published on September 18, 2019.

ialbert commented 5 years ago

One possible workaround is to consult the web page that corresponds to the same chapter. In the web version, each code box is scrollable independently:

https://www.biostarhandbook.com/introduction-to-variant-calling.html

I need to find latex experts that can advise on how to make the large code boxes wrap or scroll (if that is possible at all) in the PDF document. It is a bit out of my expertise but I will spend more time this week on this since it is a problem that readers periodically run into.

julianstanley commented 5 years ago

I need to find latex experts that can advise on how to make the large code boxes wrap or scroll (if that is possible at all) in the PDF document.

I'd be really impressed if scrolling is possible in a normal PDF, but I bet it can wrap. I'm definitely not an expert, but there's a similar problem solved here. But I assume this isn't typeset by hand--are you using pandoc? If so, this post may be helpful? Or this more recent one.

ialbert commented 5 years ago

Thanks for the pointer - we will investigate.

ChristianKKelley commented 3 years ago

For many of the cases, couldn't you use '\' to write one liners out on multiple "lines"? For example, a one-liner on pdf page 412 that runs off the edge could be written as

cat pairs.txt | cut -f 1 | sort | uniq -c | sort -rn | \

> tr -s ' ' | cut -f 3 -d ' ' | head -5 > study

I notice this issue is especially bad in the epub file

edit: escaped the '\' characters edit: I have the December 14, 2020 edition edit: moved '\' such that, if in the Dec 14 2020 pdf AND epub version, it would not run off the edge edit: I believe I saw the use of '\' earlier in the book, and so seems there is some precedent edit: also, I believe this would be MUCH better than simply wrapping text, as that could look like a one-liner is a multi-liner, when it is not. ; also added the '>' character edit(s): formatting

ialbert commented 3 years ago

yes, the wrapping either soft or hard wrapping with markers or not wrapping it has been somewhat of a challenge.

there were versions of the book with and withoth wrapping (since that is a CSS property). There are several competing needs since the different font sizes and layouts require different lengths of wrapping. I would prefer no wrapping on HTML (since it is easy to scroll horizontally) full wrapping on the other two platforms. But since PDF are generated via latex and epub via pandoc, even those styles would be different.

I am going to look into a solution that wraps the epub with soft wraps. That way it will be fully readable on all ebook readers at I would not need to rework every example since that is a tedious work that is hard to get going.

I will leave this issue open and investigate adding the appropriate style sheets to the ebook generation.

outpaddling commented 3 years ago

How are the code blocks represented in the markdown source? Enclosed in triple-backticks? In any case, it should be easy to write a simple script that flags lines longer than N characters within those blocks and wraps them as needed. I think the script would have to ask the user how to wrap each instance, since some blocks are Unix commands that should have a trailing \ to indicate continuation + indent of continued lines for clarity. Others are just text that should be wrapped without continuation markers or extra indents. Automatically determining which is which would be hard and unnecessary, since it wouldn't take long to run interactively and it only has to be done once per release.

outpaddling commented 3 years ago

I whipped up a prototype of the filter I suggested, assuming triple-backticks are used for commands and other verbatim text. Attaching a sample input that includes one of the truncated lines from the handbook.

#include <stdio.h>
#include <sysexits.h>
#include <stdbool.h>
#include <ctype.h>
#include <string.h>
#include <stdlib.h>
#include <errno.h>

#define LINE_MAX    1024
#define WRAP_LEN    64

int     main(int argc,char *argv[])

{
    char    line[LINE_MAX];
    bool    verbatim_mode = false;
    size_t  line_len, c;
    int     cont;
    FILE    *infile, *outfile;

    switch(argc)
    {
    case    3:
        break;
    default:
        fprintf(stderr,"Usage: %s input-file output-file\n",argv[0]);
        return EX_USAGE;
    }
    if ( (infile = fopen(argv[1], "r")) == NULL )
    {
    fprintf(stderr, "Cannot open %s for reading: %s\n",
        argv[1], strerror(errno));
    return EX_NOINPUT;
    }
    if ( (outfile = fopen(argv[2], "w")) == NULL )
    {
    fprintf(stderr, "Cannot open %s for reading: %s\n",
        argv[1], strerror(errno));
    return EX_CANTCREAT;
    }

    while ( fgets(line, LINE_MAX, infile) != NULL )
    {
    line_len = strlen(line) - 1;

    // Toggle verbatim mode when ``` is encountered
    if ( strcmp(line, "```\n") == 0 )
        verbatim_mode = !verbatim_mode;

    // Wrap long lines when in verbatim mode
    if ( verbatim_mode && (line_len > WRAP_LEN) )
    {
        // User should decide if a \ should be used to indicate
        // continuation.  Long lines may be Unix commands other other text.
        puts("\nLong verbatim line detected:");
        printf("===\n%s===\n\n", line);
        do
        {
        fputs("Add continuation char to end when wrapping? (y/n) ", stdout);
        cont = getchar();
        getchar();  // discard newline
        }   while ( (cont != 'y') && (cont != 'n') );

        // Find last whitespace before col WRAP_LEN
        for (c=line_len; (c > 0) && (!isspace(line[c]) || (c > WRAP_LEN)); --c)
        ;

        if ( c > 0 )
        {
        if ( cont == 'y' )
        {
            line[c] = '\0';
            fprintf(outfile, "%s \\\n    %s", line, line + c + 1);
        }
        else
        {
            line[c] = '\n';
            fputs(line, outfile);
        }
        }
        else
        puts("Could not fine a good wrap point.");
    }
    else
        fputs(line, outfile);
    }
    fclose(infile);
    fclose(outfile);
    system("cat filtered.md");
    return EX_OK;
}

sample.md

ialbert commented 3 years ago

thanks for the code @outpaddling , we are fairly competent programmers ourselves and primarily the problem was not the lack of a pre-processor that we could run.

The book is written via the bookdown software:

https://bookdown.org/

And is automatically generated by via bookdown. The solution needs to integrate with bookdown - for example, we edit and view the book live as we edit it, the bookdown server reloads the page.

When generating output we have the ability to control styles on the page. For example, we can apply HTML CSS tags to elements. That way we have full control over how the lines are rendered. We can wrap long lines or show them scrollable etc.

We can also load up latex commands/definitions/libraries before the latex output generation begins (my understanding is that bookdown runs pandoc to generate the PDF and eBook). What I am looking to find is a way to instruct the automatic bookdown (via pandoc) conversion to wrap long lines as necessary. That is the information that would allow us to solve the problem.

It is true that perhaps identifyin the long lines manually (long line in file X at line Y) then wrapping those manually could also a be solution. I never thought of it that way. Perhaps that would work as well.

outpaddling commented 3 years ago

thanks for the code @outpaddling , we are fairly competent programmers ourselves and primarily the problem was not the lack of a pre-processor that we could run.

I figured you could, but it took 15 minutes to demonstrate exactly what I was suggesting, so might as well..

The book is written via the bookdown software:

https://bookdown.org/

And is automatically generated by via bookdown. The solution needs to integrate with bookdown - for example, we edit and view the book live as we edit it, the bookdown server reloads the page.

When generating output we have the ability to control styles on the page. For example, we can apply HTML CSS tags to elements. That way we have full control over how the lines are rendered. We can wrap long lines or show them scrollable etc.

We can also load up latex commands/definitions/libraries before the latex output generation begins (my understanding is that bookdown runs pandoc to generate the PDF and eBook). What I am looking to find is a way to instruct the automatic bookdown (via pandoc) conversion to wrap long lines as necessary. That is the information that would allow us to solve the problem.

My impression is that LaTeX cannot solve this problem on it's own and it will require massaging what goes into the LaTeX verbatim block. I was unable to find a solution other than manually editing the verbatim text when I authored a book in LaTeX many years ago. I did just look into this to see if any new capabilities have evolved, but came up empty (though maybe more effort would change my fortune).

It is true that perhaps identifyin the long lines manually (long line in file X at line Y) then wrapping those manually could also a be solution. I never thought of it that way. Perhaps that would work as well.

Being a command-line hack, I wonder if one could insert a filter into the bookdown pipeline where it converts the markdown to LaTeX.

For now, at least the HTML rendering looks good, so we can get the full command syntax if we really need it.

Jason