long lines automatically broken

cmhughes commented 8 years ago

From JDO:

Hi Chris,

Nice package! I write lots of LaTeX as a professor, and I collaborate with my students, and we use github for source control. I'm an Emacs man so I have historically just formatted everything within 80 characters. It looks like you're doing exactly what I've always done (but I don't see an "80" parameter anywhere?) so that's terrific.

However, my students all appear to prefer paragraphs as a single line. I never liked that in Emacs until I discovered visual-line-mode, which is very nice. Would you consider a latexindent option for long lines, meaning text paragraphs would format as a single line (basically, you just never insert newlines within a paragraph, everything else is identical)?

long.txt long_out.txt short.txt short_out.txt

jowens commented 8 years ago

Please cc: @jowens when you eventually get around to this! Thanks!

cmhughes commented 8 years ago

From Jasjuang (originally posted https://github.com/cmhughes/latexindent.pl/issues/36): Oops, yes it's the same. However, I will like to clarify that I hope it's not a simple line break when the character hits 80, breaking while aligning + - * / & in the equations/table (kinda like what clang format is doing for C++) will be super awesome.

cmhughes commented 8 years ago

Ideas for yaml side of the implementation:

breakLinesAtMaxChar: 
       maxCharPerLine: -1
       dontBreakLinesContaining:
             verb: 1
             lstinline: 1
      breakPreferences:
             midword: 0
             useSpaceImmediatelyBefore: 1
             useSpaceImmediatelyAfter: 0

cmhughes commented 7 years ago

Note to self: m switch will need to be active, and the guys of this routine should go in ModifyLineBreaks.pm.

Any such routine will need to happen after verbatim environments have been stored, but before the other objects have been found.

Alex-Jordan commented 7 years ago

One reason for wanting to break up paragraphs into lots of lines is that when the project is being managed through version control software like git, suppose that there is a minor correction like a typo in one sentence within the paragraph. If you correct that, then view the diff, it's not always easy to see exactly what changed if git displays the entire line where there was one small change. There is a --color-words options that helps, but not always. So with short (say 80 character) lines, these diffs are easier to look at and understand.

I gather 80 is some kind of standard used in Emacs. I's worth considering the "66 character ideal" that is mentioned here: http://mikeyanderson.com/optimal_characters_per_line. This is to say that an adjustable parameter would be great.

cmhughes commented 7 years ago

Yep, totally agree with your points here :)

Do you have any ideas on what you'd like the YAML interface to look like? For example, what do you think to the stuff I put earlier in the thread? What options would you like to see?

On Thu, 23 Mar 2017 at 18:06, Alex Jordan notifications@github.com wrote:

One reason for wanting to break up paragraphs into lots of lines is that when the project is being managed through version control software like git, suppose that there is a minor correction like a typo in one sentence within the paragraph. If you correct that, then view the diff, it's not always easy to see exactly what changed if git displays the entire line where there was one small change. There is a --color-words options that helps, but not always. So with short (say 80 character) lines, these diffs are easier to look at and understand.

I gather 80 is some kind of standard used in Emacs. I's worth considering the "66 character ideal" that is mentioned here: http://mikeyanderson.com/optimal_characters_per_line. This is to say that an adjustable parameter would be great.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cmhughes/latexindent.pl/issues/33#issuecomment-288810791, or mute the thread https://github.com/notifications/unsubscribe-auth/ACHxYPD_QAS7ne48qeuFB3Poa6TjDC4uks5rorSRgaJpZM4GMj81 .

cmhughes commented 7 years ago

It looks like the Text::Wrap module may be useful here:

http://perldoc.perl.org/Text/Wrap.html

cmhughes commented 7 years ago

As of https://github.com/cmhughes/latexindent.pl/commit/c328022563941473484a545e84898a10c4de3694, I have implemented this feature using the Text::Wrap module.

There are a few examples located in https://github.com/cmhughes/latexindent.pl/tree/develop/test-cases/maxLineChars.

The YAML interface is within

modifyLineBreaks:
    preserveBlankLines: 1
    condenseMultipleBlankLinesInto: 1
    textWrapOptions:
        columns: 0
        separator: ""

and is documented in the documentation.

@jowens @Alex-Jordan does this look/work/feel as you intended?

jowens commented 7 years ago

@cmhughes would the expectation be that the following would trigger this code?

$ head -2 ../../src/latexindent.pl/latexindent.pl
#!/usr/bin/env perl
#   latexindent.pl, version 3.1, 2017-05-01
$ cat textwrap1.yaml
modifyLineBreaks:
    preserveBlankLines: 1
    condenseMultipleBlankLinesInto: 1
    textWrapOptions:
        columns: 0
        separator: ""
$ ../../src/latexindent.pl/latexindent.pl -m long.tex -o long-mod1.tex -l textwrap1.yaml
...

cmhughes commented 7 years ago

You need to set columns >= 2

On Fri, 5 May 2017 at 23:07, John Owens notifications@github.com wrote:

@cmhughes https://github.com/cmhughes would the expectation be that the following would trigger this code?

$ head -2 ../../src/latexindent.pl/latexindent.pl

!/usr/bin/env perl

latexindent.pl, version 3.1, 2017-05-01

$ cat textwrap1.yaml modifyLineBreaks: preserveBlankLines: 1 condenseMultipleBlankLinesInto: 1 textWrapOptions: columns: 0 separator: "" $ ../../src/latexindent.pl/latexindent.pl -m long.tex -o long-mod1.tex -l textwrap1.yaml ...

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/cmhughes/latexindent.pl/issues/33#issuecomment-299586884, or mute the thread https://github.com/notifications/unsubscribe-auth/ACHxYKuCsYMfkf3fEW4FaQvLhjreuYp0ks5r252RgaJpZM4GMj81 .

jowens commented 7 years ago

I think I'm missing the point here. I basically want columns to be infinity, yeah?

cmhughes commented 7 years ago

I thought you'd want columns to be set to 75 or 80 or something similar?

I hope I haven't misunderstood this! :)

On Sat, 6 May 2017 at 13:13, John Owens notifications@github.com wrote:

I think I'm missing the point here. I basically want columns to be infinity, yeah?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cmhughes/latexindent.pl/issues/33#issuecomment-299635854, or mute the thread https://github.com/notifications/unsubscribe-auth/ACHxYDZqA8dGr6lEGtcwF2HF1TM_fAewks5r3GPSgaJpZM4GMj81 .

jowens commented 7 years ago

"text paragraphs would format as a single line (basically, you just never insert newlines within a paragraph, everything else is identical)"

cmhughes commented 7 years ago

Thanks for repeating this, apologies I got it wrong -- I've implemented almost the opposite of what you wanted. Leave it with me, this is top of the list for latexindent.pl.

jowens commented 7 years ago

Limiting columns is definitely useful. But infinite paragraph lengths is also useful and what I was hoping you'd do here. Thanks!

cmhughes commented 7 years ago

@jowens As of https://github.com/cmhughes/latexindent.pl/commit/fde735bff6cd97b861f7584ab8e997409184ec22 I think I have a demonstration of an early version of this feature.

You'll need the files from https://github.com/cmhughes/latexindent.pl/commit/fde735bff6cd97b861f7584ab8e997409184ec22

If you grab https://github.com/cmhughes/latexindent.pl/blob/fde735bff6cd97b861f7584ab8e997409184ec22/test-cases/maxLineChars/jowens-short-multi-line.tex and https://github.com/cmhughes/latexindent.pl/blob/fde735bff6cd97b861f7584ab8e997409184ec22/test-cases/maxLineChars/removeBodyLineBreaks.yaml

and then run

latexindent.pl -m jowens-short-multi-line.tex -l=removeBodyLineBreaks.yaml

then you should receive https://github.com/cmhughes/latexindent.pl/blob/fde735bff6cd97b861f7584ab8e997409184ec22/test-cases/maxLineChars/jowens-short-multi-line-one.tex

Is this in line with what you want?

The next step is to customise the YAML interface to allow the user to tweak this for the various different code blocks.

cmhughes commented 7 years ago

@jowens checkout

For reference, this is on a new branch: https://github.com/cmhughes/latexindent.pl/tree/feature/remove-para-line-breaks

cmhughes commented 7 years ago

Documentation for this feature as of https://github.com/cmhughes/latexindent.pl/commit/fbf70ce60f72a2c7c00239ef9ad3c8ff3cba8d77, see page 44 onwards

jowens commented 7 years ago

So this looks really good. Let me throw back a couple of things I'm noting though:

This output leaves a newline after "use".

\section{Conclusions and Future Work} In this work, we have shown that it is possible to use
a GPU for recommendation algorithms on social graphs, but there are still many ways in which the performance could be improved. Software platforms for large-scale online social network analysis on hybrid CPU-GPU architectures could potentially offer better throughput and performance on systems that are more cost-effective than today's CPU-based cluster architectures. However, moving workloads to GPUs is challenging for the following reasons:

This formatting was interesting (I expected a caption would be treated as a paragraph but instead it maintained the original linefeeds):

\begin{figure*}
  \centering
  \caption{Overview of Twitter's WTF algorithm.
  \emph{Frame 1:} The initial graph (red [dark] node is the user for whom recommendations are being computed).
  \emph{Frame 2:} The Circle of Trust (nodes in pink [dark]) is found using Personalized PageRank.
  \emph{Frame 3:} The graph is pruned to include only the CoT and the users they follow.
  \emph{Frame 4:} The relevance scores of all users on the right side are computed with Twitter's Money algorithm. Node 4 will be suggested for Node 0 to follow because it has the highest value of all nodes the user is not already following.}
\end{figure*}

jowens commented 7 years ago

I should note tabulars that aren't quite what I expected either (do you format tabular)? These are outputs.

Should you force a newline after a \\?

  \begin{tabular}{*{6}{l}}
    \toprule Dataset   & Vertices & Edges  \\
    \midrule wiki-Vote & 7.1k     & 103.7k \\
    twitter-SNAP       & 81.3k    & 2.4M   \\ gplus-SNAP         & 107.6k   & 30.5M  \\ twitter09          & 30.7M    & 680M   \\ \bottomrule
  \end{tabular}

This one seems like there's some similarity between neighboring rows but they don't match the header row:

  \begin{tabular}{*{10}{r}}
    \toprule & \multicolumn{2}{c}{wiki-Vote} & \multicolumn{2}{c}{twitter} & \multicolumn{2}{c}{gplus} & \multicolumn{2}{c}{twitter09} \\
    \midrule Step (runtime) & Cassovary & GPU  & Cassovary & GPU  & Cassovary & GPU   & Cassovary & GPU     \\
    \midrule PPR (ms)       & 418       & 0.45 & 480       & 0.84 & 463       & 4.74  & 884       & 832.69  \\
    CoT (ms)                & 262       & 0.54 & 2173      & 1.28 & 25616     & 2.11  & 2192      & 51.61   \\ Money/SALSA (ms)        & 357       & 2.70 & 543       & 5.16 & 2023      & 18.56 & 11216     & 158.37  \\ Total (ms)              & 1037      & 4.37 & 3196      & 8.36 & 28102     & 26.57 & 14292     & 1044.99 \\ Speedup & \multicolumn{2}{r}{235.7} & \multicolumn{2}{r}{380.5} & \multicolumn{2}{r}{1056.5} & \multicolumn{2}{r}{13.7}\\ \bottomrule
  \end{tabular}

jowens commented 7 years ago

$ cat textwrap1.yaml
# Default value of indentation
defaultIndent: "  "   # because I'm not a monster

modifyLineBreaks:
    removeParagraphLineBreaks:
        all: 1

cmhughes commented 7 years ago

Thanks for the follow-ups.

I'll respond to each case in turn:

\section{Conclusions and Future Work} In this work, we have shown that it is possible to use
a GPU for recommendation algorithms on social graphs, but there are still many ways in which

latexindent.pl does not classify this as a text paragraph, because it begins with a \; with reference to your original statement: text paragraphs would format as a single line (basically, you just never insert newlines within a paragraph, everything else is identical), I took this to mean pure text, not including commands.

With regards to your caption example, it's sort of the same thing: it's not a paragraph made up pure text, as it contains \emph commands.

The routine stops the current paragraph when it reaches one of the following:

a blank line
a \par command
the beginning of a command or environment

Without this last bullet point, the following could happen, for example:

short lines
more short lines
\begin{myenv}
...

could be turned into

short lines more short lines\begin{myenv}...

which isn't necessarily what people would want from this feature (it's possible using polyswitches).

I think that the tabular example is useful; perhaps a YAML switch such as

    removeParagraphLineBreaks:
        all: 1
        alignAtAmpersandTakesPriority: 1

What do you think about this?

jowens commented 7 years ago

latexindent.pl does not classify this as a text paragraph, because it begins with a \

That's a fair design decision to make.

With regards to your caption example, it's sort of the same thing: it's not a paragraph made up pure text, as it contains \emph commands.

I'm gonna have to quibble with this one. \emph is common in text paragraphs (also common would be paragraphs with, say, \cite or \footnote or the like).

alignAtAmpersandTakesPriority

I don't have a strong feeling on this one. As long as you have a principled way to format tabular, I'm happy with whatever you pick; but the user-controlled setting you propose seems perfectly cromulent to me.

cmhughes commented 7 years ago

Thanks very much for the follow-up, this is an interesting feature to explore and I'm very glad that you're shaping it! :)

If you'd like, I could remove the the third condition:

The routine stops the current paragraph when it reaches one of the following:

a blank line
a \par command
~~the beginning of a command or environment~~

This should allow your \emph, \footnote and any other \command to work as you'd like. The price would be as in examples that I detailed above..... Or else perhaps another YAML switch could be created to let the user decide if the third bullet point above should be included...? What do you think?

Alex-Jordan commented 7 years ago

Ghost following the thread.

What if the third condition only broke on an environment, triggered by \begin{, but not by other commands. Or perhaps by a very short list of commands like \section{?

jowens commented 7 years ago

I think there's an intuitive difference between "commands that mark up normal text" (\emph, \cite, \footnote, etc.) and "commands that aren't" (\section, environments, etc.). But I have not the slightest idea if that's a formal designation. It might simply be that when you see commands that look like they're inline text (word word \command{word} word), you treat them as normal text, and you assume that commands-that-aren't are, say, on their own line? I really don't have a good answer for you here.

In terms of the example that I provided, where I had

\section{Conclusions and Future Work} In this work, we ...

it is a totally fine decision if you say "you used section in such a way that it looks like it's part of the text, if you want it treated as a more substantial command, you need to put a newline after the command". That's probably the simplest thing to do.

cmhughes commented 7 years ago

Thanks for the input, both.

As of https://github.com/cmhughes/latexindent.pl/tree/feature/remove-para-line-breaks I have implemented

paragraphsStopAt:
            environments: 1
            commands: 0
            ifElseFi: 0
            items: 0
            specialBeginEnd: 0
            heading: 0
            filecontents: 0

so that the user can control which objects can be absorbed into the paragraph line break removal routine. By default, you'll see that environments will stop the routine, but that commands will not.

This means that, for example,

Now is the time for all good men to come to the aid of their country. Now
is the time for all good men to come to the aid of their country.
\emph{emphacized} is the time for all good men to come to the aid of their country.
\footnote{foot note} is the time for all good men to come to the aid of their country.
\cite{citation} is the time for all good men to come to the aid of their country.

on using the YAML

modifyLineBreaks:
    removeParagraphLineBreaks:
        all: 1

then you obtain

Now is the time for all good men to come to the aid of their country. Now is the time for all good men to come to the aid of their country. \emph{emphacized} is the time for all good men to come to the aid of their country. \footnote{foot note} is the time for all good men to come to the aid of their country. \cite{citation} is the time for all good men to come to the aid of their country.

Now is the time for all good men to come to the aid of their country. Now is the time for all good men to come to the aid of their country. \emph{emphacized} is the time for all good men to come to the aid of their country. \footnote{foot note} is the time for all good men to come to the aid of their country. \cite{citation} is the time for all good men to come to the aid of their country.

You'll also see that I've implemented

        alignAtAmpersandTakesPriority: 1

Is this sounding reasonable?

jowens commented 7 years ago

paragraphsStopAt makes a lot of sense to me. That is a thoughtful way to do it.

Is the branch ready for me to try it on some other input then?

cmhughes commented 7 years ago

Yes, indeed, please do! :)

jowens commented 7 years ago

I think the "minted" environment should probably be treated like verbatim since it's source code.

Input:

  \begin{minted}[mathescape,
    linenos,
    numbersep=5pt,
    gobble=4,
    frame=lines,
    framesep=2mm]{c}
    if (condition) {
      code_segment_1;
    } else {
      code_segment_2;
    }
  \end{minted}

Output:

  \begin{minted}[mathescape, linenos, numbersep=5pt, gobble=4, frame=lines, framesep=2mm]{c}
    if (condition) {     code_segment_1;   } else {     code_segment_2;   }
  \end{minted}

YAML:

$ cat textwrap1.yaml
# Default value of indentation
defaultIndent: "  "   # because I'm not a monster

modifyLineBreaks:
    removeParagraphLineBreaks:
        all: 1

paragraphsStopAt:
    environments: 1
    commands: 0
    ifElseFi: 0
    items: 0
    specialBeginEnd: 0
    heading: 0
    filecontents: 0

jowens commented 7 years ago

This seems odd. I liked separate lines after \\.

Input:

      \begin{eqnarray}
        10 - 6p &<& 8\\
        p &>& (10-8)/6\\
        p &>& 1/3
      \end{eqnarray}

Output:

      \begin{eqnarray}
        10 - 6p &<& 8\\ p &>& (10-8)/6\\ p &>& 1/3
      \end{eqnarray}

jowens commented 7 years ago

dot2tex might be something else you treat as verbatim.

Input:

    \begin{dot2tex}[scale=1.0]
      digraph G {
        rankdir=LR;
        node [shape="circle", fontsize=10];
        ST;
        WT;
        WN;
        SN;

        ST -> ST [label="T"];
        ST -> WT [label="N"];
        WT -> ST [label="T"];
        WT -> WN [label="N"];
        WN -> WT [label="T"];
        SN -> WN [label="T"];
        WN -> SN [label="N"];
        SN -> SN [label="N"];
      }
    \end{dot2tex}

Output:

    \begin{dot2tex}[scale=1.0]
      digraph G {
          rankdir=LR; node [shape="circle", fontsize=10]; ST; WT; WN; SN;

          ST -> ST [label="T"]; ST -> WT [label="N"]; WT -> ST [label="T"]; WT -> WN [label="N"]; WN -> WT [label="T"]; SN -> WN [label="T"]; WN -> SN [label="N"]; SN -> SN [label="N"];
        }
    \end{dot2tex}

jowens commented 7 years ago

The output here has a single newline in it that I didn't expect.

Input:

\paragraph{Datasets}
We summarize the datasets we use for evaluation in
Table~\ref{tab:dataset}. Soc-orkut (soc-ork), soc-livejournal1
(soc-lj), and hollywood-09 (h09) are three social graphs; indochina-04
(i04) is a crawled hyperlink graph from indochina web domains;
rmat\_s22\_e64 (rmat-22), rmat\_s23\_e32 (rmat-23), and rmat\_s24\_e16
(rmat-24) are three generated R-MAT graphs with similar vertex counts.
All seven datasets are scale-free graphs with diameters of less than
30 and unevenly distributed node degrees (80\% of nodes have degree
less than 64). Both rgg\_n\_24 (rgg) and roadnet\_USA (roadnet)
datasets have large diameters with small and evenly distributed node
degrees (most nodes have degree less than 12). soc-ork is from the
Stanford Network Repository; soc-lj, i04, h09, and roadnet are from
the UF Sparse Matrix Collection; rmat-22, rmat-23, rmat-24, and rgg
are R-MAT and random geometric graphs we generated. For R-MAT, we use
16 as the edge factor, and the initiator parameters for the Kronecker
graph generator are: $a=0.57,b=0.19,c=0.19,d=0.05$. This setting is
the same as in the Graph 500 Benchmark. For random geometric graphs,
we set the threshold parameter to 0.000548. The edge weight values
(used in SSSP) for each dataset are uniform random values between 1
and 64.

Output:

\paragraph{Datasets} We summarize the datasets we use for evaluation in Table~\ref{tab:dataset}. Soc-orkut (soc-ork), soc-livejournal1 (soc-lj), and hollywood-09 (h09) are three social graphs; indochina-04 (i04) is a crawled hyperlink graph from indochina web domains; rmat\_s22\_e64 (rmat-22), rmat\_s23\_e32 (rmat-23), and rmat\_s24\_e16
(rmat-24) are three generated R-MAT graphs with similar vertex counts. All seven datasets are scale-free graphs with diameters of less than 30 and unevenly distributed node degrees (80\% of nodes have degree less than 64). Both rgg\_n\_24 (rgg) and roadnet\_USA (roadnet) datasets have large diameters with small and evenly distributed node degrees (most nodes have degree less than 12). soc-ork is from the Stanford Network Repository; soc-lj, i04, h09, and roadnet are from the UF Sparse Matrix Collection; rmat-22, rmat-23, rmat-24, and rgg are R-MAT and random geometric graphs we generated. For R-MAT, we use 16 as the edge factor, and the initiator parameters for the Kronecker graph generator are: $a=0.57,b=0.19,c=0.19,d=0.05$. This setting is the same as in the Graph 500 Benchmark. For random geometric graphs, we set the threshold parameter to 0.000548. The edge weight values (used in SSSP) for each dataset are uniform random values between 1 and 64.

jowens commented 7 years ago

Then I set specialBeginEnd to 1 because I saw this behavior. I would usually expect that \end{tabular} was followed by a newline, but it isn't. I think it should be.

Input:

  \end{tabular}
  \caption[Dataset description table.]{Dataset Description Table.
    Graph types are: r: real-world, g: generated, s: scale-free, and
    m: mesh-like. All datasets have been converted to undirected
    graphs. Self-loops and duplicated edges are
    removed.\label{tab:dataset}}

Output:

  \end{tabular} \caption[Dataset description table.]{Dataset Description Table. Graph types are: r: real-world, g: generated, s: scale-free, and m: mesh-like. All datasets have been converted to undirected graphs. Self-loops and duplicated edges are removed.\label{tab:dataset}}

jowens commented 7 years ago

Odd \item[item] behavior for description lists. First, there's a spurious newline in the output. Second, \item[item] should really be preceded by a newline, as \item is.

Input:

\begin{description}
\item[vs.\ MapGraph] MapGraph is faster than Medusa on all but one
  test~\cite{Fu:2014:MAH} and Gunrock is faster than MapGraph on all
  tests: the geometric mean of Gunrock's speedups over MapGraph on
  BFS, SSSP, PageRank, and CC are 4.679, 12.85, 3.076, and 5.69,
  respectively.
\item[vs.\ CuSha] Gunrock outperforms CuSha on BFS and SSSP\@. For
  PageRank, Gunrock achieves comparable performance with no
  preprocessing when compared to CuSha's G-Shard data preprocessing,
  which serves as the main load-balancing module in CuSha.
...

Output:

\begin{description}
  \item[vs.\ MapGraph] MapGraph is faster than Medusa on all but one
  test~\cite{Fu:2014:MAH} and Gunrock is faster than MapGraph on all tests: the geometric mean of Gunrock's speedups over MapGraph on BFS, SSSP, PageRank, and CC are 4.679, 12.85, 3.076, and 5.69, respectively. \item[vs.\ CuSha] Gunrock outperforms CuSha on BFS and SSSP\@. For PageRank, Gunrock achieves comparable performance with no preprocessing when compared to CuSha's G-Shard data preprocessing, which serves as the main load-balancing module in CuSha. \item[vs.\ Totem] The 1-GPU Gunrock implementation has 1.83x more MTEPS (4731 vs.\ 2590) on direction-optimized BFS on the soc-LiveJournal dataset (a smaller scale-free graph in their test set) than the 2-CPU, 2-GPU configuration of Totem~\cite{Sallinen:2015:ADB}. \item[vs.\ nvGRAPH] For SSSP, nvGRAPH is faster than Gunrock on the roadnet dataset, but slower on the other datasets. Gunrock in general performs better on scale-free graphs than it does on regular graphs. For PageRank, nvGRAPH is faster than Gunrock on six datasets and slower on three (h04, i09, and roadnet). nvGRAPH is closed-source and thus a detailed comparison is infeasible.

jowens commented 7 years ago

Comment line gets moved. I guess this is OK, it's put at the end of the line, but it was unexpected. I do think \end{table} should be followed by a newline though.

Input:

\end{table}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\paragraph{Bucket identification}
The choice of bucket identification directly impacts performance results of any multisplit method, including ours.
We support user-defined bucket identifiers.
These can be as simple as unary functions, or complicated functors with arbitrary local arguments. For example, one could utilize a functor which determines whether a key is prime or not.
Our implementation is simple enough to let users easily change the bucket identifiers as they please.

Output:

\end{table}  \paragraph{Bucket identification} The choice of bucket identification directly impacts performance results of any multisplit method, including ours. We support user-defined bucket identifiers. These can be as simple as unary functions, or complicated functors with arbitrary local arguments. For example, one could utilize a functor which determines whether a key is prime or not. Our implementation is simple enough to let users easily change the bucket identifiers as they please.%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

jowens commented 7 years ago

More comment issues. I think that comments, for good or bad, have to stay in the order in which they are written; comments should not be moved past text (because then the context of the comment might change). This will probably result in not being able to make a paragraph into a single line without a newline. But I think this is correct behavior.

Input:

The main obstacles in achieving the speed of light performance are 1)~non-coalesced memory writes and 2)~the non-negligible cost that we have to pay to sweep through all elements and compute permutations.
The more registers and shared memory that we have (fast local storage as opposed to the global memory), the easier it is to break the whole problem into larger subproblems and localize required computations as much as possible. This is particularly clear from our results on the GeForce GTX 1080 compared to the Tesla K40c, where our performance improvement is proportionally more than just the GTX 1080's global memory bandwidth improvement (presumably because of more available shared memory per SM).
% \john{Important comment about previous sentence.}
% Our achieved rates significantly outperform regular 32-bit radix sort (Table~\ref{table:reference}).
\input{tex/tables/multisplit_rates}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{Performance on different GPU microarchitectures}\label{subsec:perf_architecture}
In our design we have not used any (micro)architecture-dependent optimizations and hence we do not expect radically different behavior on different GPUs, other than possible speedup differences based on the device's capability.
Here, we briefly discuss some of the issues related to hardware differences that we observed in our experiments.

Output:

The main obstacles in achieving the speed of light performance are 1)~non-coalesced memory writes and 2)~the non-negligible cost that we have to pay to sweep through all elements and compute permutations. The more registers and shared memory that we have (fast local storage as opposed to the global memory), the easier it is to break the whole problem into larger subproblems and localize required computations as much as possible. This is particularly clear from our results on the GeForce GTX 1080 compared to the Tesla K40c, where our performance improvement is proportionally more than just the GTX 1080's global memory bandwidth improvement (presumably because of more available shared memory per SM).   \input{tex/tables/multisplit_rates}  \subsubsection{Performance on different GPU microarchitectures}\label{subsec:perf_architecture} In our design we have not used any (micro)architecture-dependent optimizations and hence we do not expect radically different behavior on different GPUs, other than possible speedup differences based on the device's capability. Here, we briefly discuss some of the issues related to hardware differences that we observed in our experiments.%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Our achieved rates significantly outperform regular 32-bit radix sort (Table~\ref{table:reference}).% \john{Important comment about previous sentence.}

jowens commented 7 years ago

Hope that keeps you busy! Thanks for allowing me to make comments.

cmhughes commented 7 years ago

Thanks for your feedback.

I'll do my best to respond to each of your comments:

you can update verbatimEnvironments to include any environments you'd like to be treated as verbatim, so you might like to update your YAML with, for example
```
verbatimEnvironments:
 minted: 1
 dot2tex:1
```
similarly, you can update lookForAlignDelims to include eqnarray and eqnarray* and then the alignAtAmpersandTakesPriority switch will stop the paragraph-line-break routine from acting upon them
the \paragraph{Datasets} is helpful, and I agree that this is not expected. This is fixed as of https://github.com/cmhughes/latexindent.pl/commit/86be94b188302549f998e5ba9a3a0a286a94f6a9
your \end{tabular} example and \caption example now work as (I hope!) you'd expect from https://github.com/cmhughes/latexindent.pl/commit/86be94b188302549f998e5ba9a3a0a286a94f6a9
for your description example you need to tell latexindent.pl to look for items in this environment, as it does not do so by default; for example, you could use:
```
indentAfterItems:
description: 1
```
thanks for the \end{table} example -- that one was suffering from the same issue as your datasets and \end{tabular} example, and is fixed as of https://github.com/cmhughes/latexindent.pl/commit/86be94b188302549f998e5ba9a3a0a286a94f6a9
for the comment issue, I've added comments: 0 to the paragraphsStopAt options, so that users can choose to stop the routine at comments that start a line.
keeping comments in order is something that I go back and forth about; it's not obvious to me how this could be done logically.

Thanks again for your time in reviewing this feature, it's very helpful. Do let me know what you think!

cmhughes commented 7 years ago

Thanks again for all of the help in shaping this feature! I've merged this into develop, and then master, and have released it to ctan as of https://github.com/cmhughes/latexindent.pl/commit/12ef78285ed9bc9fe8155242bef3fd5fc3ad123d

cmhughes / latexindent.pl

long lines automatically broken #33

!/usr/bin/env perl

latexindent.pl, version 3.1, 2017-05-01