Closed cangiuli closed 11 years ago
Yes, we decided that one or two people should take full responsibility for this, and only after the rest of the text is finalized.
I've started working on the index. The first thing I made is a little helper script other/index-helper.py
. If you run by
other/index-helper.py | less
it will show you all occurrences of all words, file files and character locations. For example, one entry looks like this:
======== adjoint [0]
...ume $f$ to be a half adjoint equivalence and sho... [equivalences.tex @ 18299]
... \item $F$ is a left adjoint.\label{item:ct:ar1}... [categories.tex @ 49607]
...type ``$F$ is a left adjoint'' is a mere proposi... [categories.tex @ 53581]
It is showing you the word "adjoint", its three occurrences with excerpts, files and character locations within the files. The results are listed in order of increasing frequency of the word in the English language (displayed in brackets next to the word). It knows about the 10000 most frequent words. Any word not known is considered to have frequency 0 and will be displayed at the top (and those will be alphabetized).
You can specify regular expressions on the command line to filter the output. Suppose you want to know where we use "invertible", "bi-invertible", "invert", etc. You run:
other/index-helper.py 'invert'
and you get:
======== bi-invertible [0]
...f:A\to B$ is \define{bi-invertible} if it has both a l... [equivalences.tex @ 19054]
======== non-inverted [0]
...ds that start with a non-inverted element, alternate ... [homotopy.tex @ 105666]
...between inverted and non-inverted elements, and end w... [homotopy.tex @ 105719]
Interesting, why did we define 'bi-invertible' and then never use it again? Anyhow, have fun.
Wow, thanks! What is the next step? Should we divide up the things to index between us somehow?
(The reason for defining bi-invertible and then never using it again is explained, I think, in the introduction to chapter 4.)
I am imagining the process as follows:
\define{...}
occurrences to take care of index entries that are definitions.But we need to sleep on this. For example, the current output format is not the best for performing Emacs magic.
Mind you, even the first step cannot be automatic. Something like "Dedekind real" will get indexed as "Dedekind real" and "real, Dedekind", etc.
The first thing I made is a little helper script other/index-helper.py.
I can’t see it, did you forget to push?
So what would be the most useful thing for the unadorned macro \define
to
do with regard to indexing? Just index whatever phrase is given to it?
Index each word in the phrase?
On May 15, 2013 2:04 PM, "Andrej Bauer" notifications@github.com wrote:
Mind you, even the first step cannot be automatic. Something like "Dedekind real" will get indexed as "Dedekind real" and "real, Dedekind", etc.
— Reply to this email directly or view it on GitHubhttps://github.com/HoTT/book/issues/94#issuecomment-17966116 .
I created enough of an index to expose any problems we might face. Now is time to look at it and see what sort of strategy we're going to use. I have gone through a bunch of \define
's and produces index entries for them.
In general we wish to add a composite phrase in several places. For example, "surjective function" would go under both "surjective -> function" and "function -> surjective".
Yikes! I thought we were going to redefine \define
to include an indexing entry, so that we wouldn't have to write all those indexing commands by hand.
If you look at the typical indexing next to a \define
I think you will quickly realize that it just has to be done by hand. The benfit of having some \define
's do their indexing automagically is close to zero. For example, very often you have to inculde more than one entry.
yeah, unfortunately indexing has to be done by hand. i tried various forms of automation, and quickly realized that there is no alternative to manual labor. on the bright side i found it useful to do it, because it exposed some structural problems that i wouldn't have noticed, particularly when you start to think hard about index subitems and cross-referencing.
Looking through the index entries you've added so far, it appears to me that a significant fraction of the time, one of the entries to be indexed is exactly the term inside the \define{...}
, so that some work would be saved by having one form of \define
index its argument automatically. But I will bow to those with more (= some) experience creating indices.
It looks like \indexdef
does not produce hyperlinked index entries, whereas ordinary \index
does. Can that be remedied?
Hmm, also, shouldn't index entries come at the beginning of defn
environments rather than the end? If the definition ends up split across pages, we want the index entry to point to its beginning.
A couple more questions for people with more experience creating indices.
\index
or an \indexdef
? Some of the existing \indexdef
s didn't look to me like they should be that way; my inclination would be to only do that for important mathematical definitions, not for say the induction principle of cartesian products or the introduction of the word "context" in chapter 1, even though the latter are in \define
s.I would say, first of all, that the index is not a mathematical construct and it does not have to be consistent. Of course it shouldn't be a mess either.
When indexing a term such as "weak equivalence" I generally ask myself "Where do people exepect to find this in the index?" And then I place it everywhere that they might look for, i.e., it is silly to "save space" in the index, or to not include entries on the grounds of "consistency". People don't care about that, they just want to find whatever they're looking for, and they won't "study the system" by which we constructed the index.
For \indexdef
entries, I put those in whenever there was a \define
, but maybe that wasn't the best strategy. I think in general you put in an \indexdef
for the main entry, whatever that is (definition, where the discussion can be found). Not every entry has an \indexdef
.
I think it would be bad if, say, all types had an index entry under 'induction principle' but only 75% of them had an identical entry under 'eliminator'. That's what I meant by 'consistency'. On May 21, 2013 11:14 PM, "Andrej Bauer" notifications@github.com wrote:
I would say, first of all, that the index is not a mathematical construct and it does not have to be consistent. Of course it shouldn't be a mess either.
When indexing a term such as "weak equivalence" I generally ask myself "Where do people exepect to find this in the index?" And then I place it everywhere that they might look for, i.e., it is silly to "save space" in the index, or to not include entries on the grounds of "consistency". People don't care about that, they just want to find whatever they're looking for, and they won't "study the system" by which we constructed the index.
For \indexdef entries, I put those in whenever there was a \define, but maybe that wasn't the best strategy. I think in general you put in an \indexdef for the main entry, whatever that is (definition, where the discussion can be found). Not every entry has an \indexdef.
— Reply to this email directly or view it on GitHubhttps://github.com/HoTT/book/issues/94#issuecomment-18259319 .
I want to work on the index, because it needs to get done in the next week or so and no one seems to be working on it. So far I have been fixing things that seem wrong, and thereby getting drawn into doing other things out of order, but I should start being more systematic. I guess I will start with your step 1 by looking for all \define
s, and then have a look at your helper script.
Why is a phrase like "2-category" alphabetized as ".category" rather than "2-category"?
Could be like:
Category, 2-, bi-, pre-,
etc.
Sent from my iPhone
On May 22, 2013, at 1:18 PM, Mike Shulman notifications@github.com wrote:
Why is a phrase like "2-category" alphabetized as ".category" rather than "2-category"?
— Reply to this email directly or view it on GitHub.
I indexed a bunch of definitions: 69001947
It also looks odd for the alphabetization to go \dagger-category, \modal-connected, \dagger-precategory, \modal-truncated.
I think the idea is that symbols all get bunched together at the beginning of the index, so they're all aphabetized as if they were a character that comes before all letters. If we put $\dagger$-category under one letter and $\moda$-category, under another, nobody will ever find them.
I agree that all symbols should come before all letters; I think that's fairly standard. I was saying I think it's weird to treat all symbols as equivalent -- I would expect \dagger-category and \dagger-precategory to be next to each other. I think I would also expect numbers to come in between symbols and letters (in numerical order). On May 22, 2013 9:03 PM, "Andrej Bauer" notifications@github.com wrote:
I think the idea is that symbols all get bunched together at the beginning of the index, so they're all aphabetized as if they were a character that comes before all letters. If we put $\dagger$-category under one letter and $\moda$-category, under another, nobody will ever find them.
— Reply to this email directly or view it on GitHubhttps://github.com/HoTT/book/issues/94#issuecomment-18322319 .
I enhanced the index-helper script a bit: e92a432. In particular, it was missing a lot of words because the 20-character-on-each-side context was part of the regexp match, and finditer
doesn't allow overlapping matches.
Also, I'm curious: exactly which word frequency list on wiktionary did this come from? I'm puzzled by the "words" like mr|mr
, u
, blockquote
, dr
, and sn
.
I did the 'A's: cdec2aa302.
looks great! can you give us some instructions for how you did it ? if we each take few letters, we could be done soon. maybe we should "check out" the letters while we're working on them, so we don't overlap too much?
On May 24, 2013, at 2:25 AM, Mike Shulman notifications@github.com wrote:
I did the 'A's: cdec2aa.
— Reply to this email directly or view it on GitHub.
I pushed a little Emacs script that I've been using, with instructions: e0c0604. I'm not sure whether or not it's actually helpful over just bouncing through the files by hand with M-x tags-search
, but you can try it out if you like.
Checking out letters before starting to work on them seems like a good idea.
I'm going to start on the common words (the un-alphabetized ones that come after "zy" in the output of Andrej's script).
Whoa, your emacs script is really cool. I will try letter B now. How are we going to avoid merge problems?
Cross our fingers and hope, and commit and pull often, especially before and after making more global sorts of changes? On May 24, 2013 9:56 AM, "Andrej Bauer" notifications@github.com wrote:
Whoa, your emacs script is really cool. I will try letter B now. How are we going to avoid merge problems?
— Reply to this email directly or view it on GitHubhttps://github.com/HoTT/book/issues/94#issuecomment-18417053 .
... and edit individual lines of text without rewrapping entire paragraphs ...
On Fri, May 24, 2013 at 3:04 PM, Mike Shulman notifications@github.comwrote:
Cross our fingers and hope, and commit and pull often, especially before and after making more global sorts of changes? On May 24, 2013 9:56 AM, "Andrej Bauer" notifications@github.com wrote:
Whoa, your emacs script is really cool. I will try letter B now. How are we going to avoid merge problems?
— Reply to this email directly or view it on GitHub< https://github.com/HoTT/book/issues/94#issuecomment-18417053> .
— Reply to this email directly or view it on GitHubhttps://github.com/HoTT/book/issues/94#issuecomment-18423103 .
Should Greek letters be indexed as symbols (i.e. at the beginning, before roman letters) or according to the spelling of the letter (i.e. '\lambda-abstraction' in between 'kernel' and 'law')?
And why does \indexdef
produce a different index entry than \index
when given the same argument? See \lambda-abstraction, for instance.
An arbitrary sampling of a dozen or so math books on my bookshelf suggests that many don't try to index terms that start with Greek letters at all, but of those that do, a majority index them according to the spelling of the letter. I only found two which indexed the Greek letters with the symbols at the beginning -- and I found one which ignored the Greek letter entirely, indexing '\lambda-abstraction' as if it were simply 'abstraction'!
I recently discovered that you can index ranges as well as individual pages. In case anyone else was unaware of this, type \index{thing|(}
at the beginning of the range and \inde{thing|)}
at the end, and the index entry will span the appropriate pages. If you want to get the effect of an \indexdef
also with this construct then you have to type explicitly \index{thing|(defstyle}
.
Also, it seems that for now, the best way to avoid duplicate entries for entries containing symbols is to index them with defstyle
explicitly, e.g. write \index{lambda-abstraction@$\lambda$-abstraction|defstyle}
rather than \indexdef{lambda-abstraction@$\lambda$-abstraction}
. There aren't that many such entries, so this shouldn't be so much of a problem.
I am done with the unalphabetized common words. Tomorrow I'll check out some more letters and get to work on them. Tomorrow is also basically the only day I have left to work on this before our self-imposed deadline, so it would be nice to have some help working on the index...
I'm not sure I have any time to spare, but, if I do, how would I check out a letter?
Just write in a comment here that you're going to do it.
Andrej (and everyone else), can you please check out the letters you're currently working on (just post a comment here before you start working on them) so that no one else tries to do them at the same time?
I see commits from Andrej for T U V W X Y Z. One can surmise he's working on S now.
I'll take L.
I am not working on S. I am playing with \hbox
\vbox
magic to make the lulu.com cover. I am failing (see 57fb7909cdb93990f846b36148c280beac8b8bd7).
Everyone, please before you index something, have a look at how it is already indexed. If there is an \indexsee{A}{B}
command somewhere, then in general, other occurrences of A should be indexed as B, not as A. I've just fixed this for cartesian products (1c4296f).
I have a few minutes so I'm going to try indexing G.
Who is going to index the F words :-) ?
Now that you've said that, I think you have to do it.
I'm going to try H.
:-)
That reminds me: what jokes are we going to insert into our index? Does anyone have any suggestions?
On Thu, May 30, 2013 at 5:16 PM, Andrej Bauer notifications@github.comwrote:
Who is going to index the F words :-) ?
— Reply to this email directly or view it on GitHubhttps://github.com/HoTT/book/issues/94#issuecomment-18709045 .
The best one I've been able to think of so far is to have entries for denial, anger, bargaining, depression, and acceptance pointing to appropriate sections.
We could also try putting in unintentional type theory, although I'm not sure what to make it refer to.
H is done, 3d6c80a5
i can do this or help to do it, but it has to be close to the last thing, otherwise it's a mess to get right. regardless of who does it, we will have to discuss the level of detail and how to manage cross-indexing. i found both of these hard to decide on for my own recently published book, and i could imagine that we may not agree on this.