brucemiller / LaTeXML

LaTeXML: a TeX and LaTeX to XML/HTML/ePub/MathML translator.
http://dlmf.nist.gov/LaTeXML/
Other
961 stars 101 forks source link

index references should be more descriptive #2312

Open teepeemm opened 9 months ago

teepeemm commented 9 months ago

This is inspired by a recent email to the list, but also something I've come across in the past.

The follow tex:

\documentclass{article}
\usepackage{makeidx}
\makeindex
\begin{document}
\setcounter{section}{4}
\section{Introduction}\label{sec:intro}
bare word\index{bare word}

\begin{enumerate}
\item in enumerate\label{item}\index{in enumerate}
\end{enumerate}\index{outside enumerate}
We had Item \ref{item} in Section \ref{sec:intro}
\printindex
\end{document}

produces an index:

bare word: §5
in enumerate: item 1
outside enumerate: §5 (but linked to appropriate paragraph)

The reference to "item 1" is not very descriptive about where the item is located. This is somewhat related to the fact that \ref{item} only shows the item number and not anything more descriptive, but in that case it's what \ref is asking for. In the index, it's not all that helpful.

The best solution I've found is to move the \index command outside of the enumerate, so that the index now refers to the section instead. But this doesn't seem satisfactory: (1) for a long enumerate, it would be difficult to find the item and (2) for a deeply nested enumerate in a theorem in a book using subsubsections, it would be difficult to know how far up the hierarchy to move the \index.

Would it be possible to have index entries provide a full reference to their location? In this case, something like "item 1 in §5". But objection (2) should then end up with something like "item a in item i in item 1 in Theorem 8 in §5.4.3 in chapter 2" (and might even depend on the splitting?). And I don't see how that would be all that easy to code.

So I guess this issue is a request to have more descriptive index entries, and until then this will serve as a reference for providing a slight workaround of "move the \index command out of the nesting".

This is possibly related to #2065.

davpoole commented 9 months ago

I think I was the one who sent the email.

The way our book is written we have a \keyword{term} which bolds the term and adds the term to the index, so moving it outside of the enumerate is not practical (as it will mess up the pdf version).

I would like the option to just include the section/subsection in the index. To see how bad it is now see https://artint.info/3e/html/ArtInt3e.idx.html Having "second item" (in an itemize) is not very useful! Given there is a hyperlink, the main use of the label is to help the reader know which instance of the term the index is referring to. For that, we don't want a complex description of where to find it (such as "item 1 in §5.3") -- or a long winded description (as in the issue description) -- because we know how to find it (we click on the hyperlink); we want some way to determine which instance of the term we mean. The (sub)section should be adequate to do that. Think of the use case of multiple entries for a single term.

My issue is not "to have more descriptive index entries" but to have less description index entries! Just the (sub)sections, please.

teepeemm commented 9 months ago

But if a document has been --splitat=subsubsection, then a reference to a section is not specific enough. So in the case of fine grained splitting, we would need to refer to the html page that contains the reference. But then if the document wasn't split at all (or --splitat=chapter), that we're back to not specific enough. It might work to have the reference be the --splitat or the secnumdepth, whichever is more specific. That would still fail if I've done something dumb like \renewcommand{\thesubsection}{\arabic{subsection}}, but it would usually work out.

In your current case, you could switch to \keywordpdf{term} which bolds the term and only adds the term to the index in the pdf, and then \iflatexml\index{term}\fi outside of the environment. Which isn't great, but would get your job done.

davpoole commented 9 months ago

The hyperlink points to where the index term refers; if I click on the name it takes me there. That is what I want.

The text is purely for letting the reader decide which instance of the term they want. We don’t need to know the detailed description of where the reference is, as the hyperlink takes us there. I would suggest the text for the hyperlink is the most specific of the numbered section or subsection that the \index is in.

As a reader I want to know whether I should click on the 3.7.4 link or the 8.3 link or the 14.7.3 link. I can probably guess which one is the appropriate one. When I click on the link, it should take me to the place in the page where the definition/reference is (the \index{} is). I don’t think it really matters what the —splitat is.

David

On Feb 5, 2024, at 7:13 PM, Tim Prescott @.***> wrote:

[CAUTION: Non-UBC Email]

But if a document has been --splitat=subsubsection, then a reference to a section is not specific enough. So in the case of fine grained splitting, we would need to refer to the html page that contains the reference. But then if the document wasn't split at all (or --splitat=chapter), that we're back to not specific enough. It might work to have the reference be the --splitat or the secnumdepth, whichever is more specific. That would still fail if I've done something dumb like \renewcommand{\thesubsection}{\arabic{subsection}}, but it would usually work out.

In your current case, you could switch to \keywordpdf{term} which bolds the term and only adds the term to the index in the pdf, and then \iflatexml\index{term}\fi outside of the environment. Which isn't great, but would get your job done.

— Reply to this email directly, view it on GitHub https://github.com/brucemiller/LaTeXML/issues/2312#issuecomment-1928707175, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC3NBGJHWHPGCO3VI326CTDYSGNURAVCNFSM6AAAAABCYIHRDSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRYG4YDOMJXGU. You are receiving this because you commented.

dginev commented 9 months ago

Let me preface that I agree with the discussion so far. We should consider a more descriptive scheme for index entries pointing deep inside the narrative tree.

As to the current setup, I think some additional detail is provided as a tooltip, and readers can discover it on hover. For example, in the AI book linked above, we see: "bipartite graph 3rd item", which on hover also presents:

In 4.3 Consistency Algorithms ‣ Chapter 4 Reasoning with Constraints ‣ Artificial Intelligence: Foundations of Computational Agents, 3rd Edition


I also wanted to comment on:

As a reader I want to know whether I should click on the 3.7.4 link or the 8.3 link or the 14.7.3 link. I can probably guess which one is the appropriate one.

The need to quickly decide which one of multiple possible index links is most relevant is also a good motivation for a "link preview" feature. Assuming the reader can tell in advance is a bit optimistic, since sometimes one arrives at a book while skimming for a specific purpose. For example, when collecting information on related work, I may consult the index of a work first, before reading any of the main content.

A nice benefit to "link preview" for index entries is that it can be realized entirely in javascript (externally to latexml). LessWrong has this feature.

But I bring that up as an "add on" beyond the static HTML representation. Indeed latexml's output HTML dialect tries to minimize reliance on JS.