AmerMathSoc / texml

A repository for texml development
8 stars 2 forks source link

automatic punctuation #199

Open pkra opened 1 month ago

pkra commented 1 month ago

Way back when texml-to-html implemented some simplistic automatic punctuation magic. While thinking about improvements, I recently noticed (https://github.com/AmerMathSoc/texml-to-html/issues/385#issuecomment-2096172603) that some classes (gsm and text) do not add a period to chapter and section labels (so we remove them again later on).

This made me wonder if it wouldn't be better to handle punctuation magic on the texml end: if we end up sometimes removing it downstream, it seems safer to have texml add it (since it understands TeX and can process any custom behavior from TeX sources and packages).

Ideally such punctuation could be marked up (e.g., x, maybe with specific-use), though I realize we can't identify manual extraneous punctuation (that we sometimes see).

davidmjones commented 1 month ago

I think this is an excellent idea and shouldn't be too hard to implement. Incidentally, we've recently started removing explicit periods at the end of, for example, section titles so that we can be more consistent in the XML files.

davidmjones commented 2 weeks ago

@pkra Should <alt-title>s include the generated punctuation?

pkra commented 2 weeks ago

@davidmjones I had noticed that, too. I didn't think too deeply about it but it seemed correct to me. Either way, I don't have strong feelings about it.

I think right now we only use alt-title to populate HTML title tags and in the simplified epub TOC (metadata for "in-app TOC display"). I think it makes sense that those match the "proper" content.

davidmjones commented 2 weeks ago

@pkra Good. I noticed that I was generating <alt-text> tags whenever there was generated punctuation at the end of the title, which isn't what I meant to do. I fixed that, so you will need to be prepared to strip out <x/> tags for the TOC. I think. Unless you want me to always generate the <alt-text/> tags.

pkra commented 2 weeks ago

@davidmjones thanks for the heads up; it shouldn't be a problem right now. Downstream is set up to not expect (but favor) alt-title elements. (Just to double check: I think you meant alt-title (not alt-text), right?)

pkra commented 1 day ago

Note to self: check that this percolates to toc-entry.