Closed nrnrnr closed 6 years ago
Hi,
Thanks for the proposal. I am the author of polymode. Which pushes Dave Love's idea many level further. It is highly versatile and fully extensible. Creating new modes is usually a couple of lines of declarations (see poly-R.el for examples where there are a dozen of modes defined in less than 200 lines of code).
It is also specifically designed for literate programming. There are specialized classes for weavears and exporters and they respect the polymode inheritance.
So, the answer to your question is defenitely "yes"! I would be so happy to join the forces. Long overdue IMO. I am myself overcomitted on so many levels that it makes me cry. Polymode started an year ago and it is only now that I am barely making the first release :(
I understand that I am turning the argument the other way around by asking you to join polymode project. But, given that polymode is way more general in its intent and is easily extensible through a class system, my bias aside, I think it is the way to go.
I am in the process of documenting the development api. The very early draft is already there.
Thanks for reaching out to us.
Taking this to email.
Thanks for the proposal. I am the author of polymode. Which pushes Dave Love's idea many level further. It is highly versatile and fully extensible. Creating new modes is usually a couple of lines of declarations (see poly-R.el for examples where there are a dozen of modes defined in less than 200 lines of code).
It is also specifically designed for literate programming. There are specialized classes for weavears and exporters and they respect the polymode inheritance.
I had a look at the code base. There's a lot going on there. I sympathize with the idea of building on powerful abstractions (which doesn't happen often enough in the emacs world), but for somebody brand new to come in and contribute, the documentation is a little thin. I am certainly willing to try to help push things forward, but in order to make a contribution I think I will need some guidance.
In order even to make a start, I will need to know how to set the major mode to be used in code chunks.
I am in the process of documenting the development api. The very early draft is already there.
I had a look. I've done a little prototype-based OO programming in Lua,
and I'm relatively comfortable with the concepts, but I've never used
CLOS or eieio, so I'm going to be fairly slow and useless, at least at
the start. It looks like modes/poly-noweb.el is still pretty sparse,
and in particular I don't see how to set the major mode for either
code chunks or documentation chunks. Looking at the documentation and
the code I see that I need to have pm-basemode
and pm-submode
objects, but I'm not sure how to create them or where to splice them in.
I guess that's where I need to begin.
Norman
Hi Norman,
I was a bit slow with API documentation because the naming conventions were not settled. Some more re-factoring and tiding is on the way. Then I will proceed with detailed docs and examples. Hopefully already this week.
It looks overwhelming without documentation but the idea is pretty simple. There is a pm-config object that represents each polymode. Each time a polymode is initialized (just like any other mode in emacs) the root object (pm-config/noweb) is cloned and new object is stored locally in the buffer with name pm/config. This is how prototype inheritance works - through cloning. Pm/config is shared across all indirect buffers (one indirect buffer per submode). pm/config stores all the necessary data in internal slots that with names that start with "-" (like -basemode, -chunkmodes). The entire communication between indirect buffers is happening through this object. As pm/config is the same in all buffers there is no need to move/copy stuff around.
Submodes are also represented by objects (for example pm-submode/noweb). Each base and indirect buffer stores a local submode object in pm/submode. There are two types of submodes: basemode and chunkmodes. The chunkmodes are discovered dynamically (currently by jit-lock) and placed into the list slots -basemode and -chunkmodes of pm/config.
Now to your question on selecting the submode for nw files. I guess you mean interactive command, right? There is no such a command but it would be easy to add. I didn't think of this user pattern.
The user pattern that I had in mind is that for every possible chunk mode, you would need to create a new polymode. Example from poly-R.el:
(require 'poly-noweb)
;; inherit new config object representing noweb+R mode from root ;; pm-config/noweb (defcustom pm-config/noweb+R (clone pm-config/noweb :chunkmode 'pm-chunk/noweb+R) "Noweb for R configuration" :group 'polymode-configs :type 'object)
;; Make new chunk submode from root pm-chunk/noweb. Note that ;; :chunkmode of pm-config/noweb+R is pointing to this object. (defcustom pm-chunk/noweb+R (clone pm-chunk/noweb :mode 'R-mode) "Noweb for R" :group 'polymode-chunkmodes :type 'object)
;; define polymode (define-polymode poly-noweb+r-mode pm-config/noweb+R)
Now, poly-noweb+r-mode can be used as standard emacs mode. So you can activate it with M-x poly-noweb+r-mode, or with "mode:" declaration at the top of your file, or with explicit file association
(add-to-list 'auto-mode-alist '(".Rnw" . poly-noweb+r-mode))
The good thing about this design is that users can customize the root object pm-config/noweb as well as children objects. All noweb children will inherit the customization from the root object. Another good thing is that even low level things like chunk header regexp could be modified in children.
In order to set a new chunkmode in .nw files, one would need to set the :chunkmode slot of pm/config object to point to a pm-chunk/XXX object. I think that might be enough and the mode will be re-initialized automatically, but I am not sure.
The question is, do we really want such a command? How useful is it? Do you often change the mode of the chunk?
Vitalie
Norman Ramsey on Thu, 08 May 2014 11:57:06 -0700 wrote:
Thanks for the proposal. I am the author of polymode. Which pushes Dave Love's idea many level further. It is highly versatile and fully extensible. Creating new modes is usually a couple of lines of declarations (see poly-R.el for examples where there are a dozen of modes defined in less than 200 lines of code).
It is also specifically designed for literate programming. There are specialized classes for weavears and exporters and they respect the polymode inheritance.
I had a look at the code base. There's a lot going on there. I sympathize with the idea of building on powerful abstractions (which doesn't happen often enough in the emacs world), but for somebody brand new to come in and contribute, the documentation is a little thin. I am certainly willing to try to help push things forward, but in order to make a contribution I think I will need some guidance.
In order even to make a start, I will need to know how to set the major mode to be used in code chunks.
I am in the process of documenting the development api. The very early draft is already there.
I had a look. I've done a little prototype-based OO programming in Lua, and I'm relatively comfortable with the concepts, but I've never used CLOS or eieio, so I'm going to be fairly slow and useless, at least at the start. It looks like modes/poly-noweb.el is still pretty sparse, and in particular I don't see how to set the major mode for either code chunks or documentation chunks. Looking at the documentation and the code I see that I need to have
pm-basemode
andpm-submode
objects, but I'm not sure how to create them or where to splice them in.I guess that's where I need to begin.
Norman
— Reply to this email directly or view it on GitHub.
I was a bit slow with API documentation because the naming conventions were not settled. Some more re-factoring and tiding is on the way. Then I will proceed with detailed docs and examples. Hopefully already this week.
Great!
It looks overwhelming without documentation but the idea is pretty simple. There is a pm-config object that represents each polymode.
OK, basic questions: in the world of ideas, what is a polymode? What is a submode?
Now to your question on selecting the submode for nw files. I guess you mean interactive command, right?
It will be probably be necessary at times (when a file contains multiple modes in code chunks), but my first priority is to be able set the modes using buffer-local variables. I need not only to set the modes but also to set variables relevant to those modes.
Here's an example that works with the old noweb-mode (and with Dave Love's version):
% -*- mode: Noweb; noweb-code-mode: fundamental-mode; tab-width: 4; c-indent-level: 4; c-basic-offset: 4 ; tex-main-file: book.nw -*-
Here's a similar example that sort of works with ess-noweb-mode:
% -*- mode: ess-noweb; ess-noweb-default-code-mode: c-mode; noweb-code-mode: c-mode; tab-width: 4; c-indent-level: 4; c-basic-offset: 4 ; tex-main-file: book.nw -*-
It only "sort of" works because actually the value of c-indent-level
is not set the way it should be.
The user pattern that I had in mind is that for every possible chunk mode, you would need to create a new polymode.
Once I understand what a polymode is, that seems like a reasonable requirement. But I can reuse that polymode with different values of buffer-local variables, right?
Example from poly-R.el:
(require 'poly-noweb)
;; inherit new config object representing noweb+R mode from root ;; pm-config/noweb (defcustom pm-config/noweb+R (clone pm-config/noweb :chunkmode 'pm-chunk/noweb+R) "Noweb for R configuration" :group 'polymode-configs :type 'object)
;; Make new chunk submode from root pm-chunk/noweb. Note that ;; :chunkmode of pm-config/noweb+R is pointing to this object. (defcustom pm-chunk/noweb+R (clone pm-chunk/noweb :mode 'R-mode) "Noweb for R" :group 'polymode-chunkmodes :type 'object)
;; define polymode (define-polymode poly-noweb+r-mode pm-config/noweb+R)
I'm afraid that I see some details but I do not grasp the big picture.
All I'm getting is that a polymode is built from a thing called
pm-config/noweb+R, and that there is another thing (pm-config/noweb+R)
which points to the first thing? What is the purpose of having two
things? What is the name of the kind of thing? That is, what kind of
thing is pm-chunk/noweb
? What about pm-config/noweb?
I went and looked at the code, and they aren't defined by cloning... And I'm having trouble connecting them with the doco at
https://github.com/vitoshka/polymode/tree/master/modes
(FYI, installing Emacs 24 is going to disrupt my system pretty significantly, so I'm not ready to do it until I know I have a few hours to get problems sorted out, and I can actually try out polymode. But what it means for now is that I can't use any of the emacs documentation tools like C-h f.)
Now, poly-noweb+r-mode can be used as standard emacs mode. So you can activate it with M-x poly-noweb+r-mode, or with "mode:" declaration at the top of your file, or with explicit file association...
OK, this part I get.
(add-to-list 'auto-mode-alist '(".Rnw" . poly-noweb+r-mode))
The good thing about this design is that users can customize the root object pm-config/noweb as well as children objects.
I don't really understand the object model, so it's not yet clear to me how to benefit from customization. But I'll take it on faith that it's good.
In order to set a new chunkmode in .nw files, one would need to set the :chunkmode slot of pm/config object to point to a pm-chunk/XXX object. I think that might be enough and the mode will be re-initialized automatically, but I am not sure.
What about having a buffer-local variable so that I have multiple files using the same poly-noweb-mode, but each individual file has its own mode for code chunks?
The question is, do we really want such a command? How useful is it? Do you often change the mode of the chunk?
I have some files that contain a mix of SML code chunks and Scheme code chunks, or a mix of C code chunks and Scheme code chunks. When I edit such a file it is essential that I be able to set the mode correctly for the chunk I am editing.
(In the glorious future I would love to be able to specify the correct mode for each root chunk and to have that mode propagate to other chunks using noweb's def/use chains, but that's a problem for another time.)
Norman
Norman Ramsey on Fri, 09 May 2014 11:35:52 -0700 wrote:
[...]
OK, basic questions: in the world of ideas, what is a polymode? What is a submode?
These questions are actually addressed in the dev doc (https://github.com/vitoshka/polymode/tree/master/modes#polymodes-and-configs)
I agree that the docs are not crystal clear at this stage.
Here's an example that works with the old noweb-mode (and with Dave Love's version):
% -- mode: Noweb; noweb-code-mode: fundamental-mode; tab-width: 4; c-indent-level: 4; c-basic-offset: 4 ; tex-main-file: book.nw --
This should eventually work as expected. It wasn't the priority so far.
I'm afraid that I see some details but I do not grasp the big picture. All I'm getting is that a polymode is built from a thing called pm-config/noweb+R, and that there is another thing (pm-config/noweb+R) which points to the first thing? What is the purpose of having two things? What is the name of the kind of thing? That is, what kind of thing is
pm-chunk/noweb
? What about pm-config/noweb?
No, polymode is not "built from pm-config/noweb+R" it is represented by an object cloned from pm-config/noweb+R and stored in pm/config local variable. Most of the methods in polymode-methods.el are then dispatched on this config object. The rest of the objects are dispatched on submode objects that represent the innermodes of the buffer.
Another "things" represent submodes. The base mode (latex) and the chunkmode (R in this case). Some methods dispatch on these submode objects. Config object must be aware of what basemode and what submodes it should instantiate when it meets a chunk. This is why they are linked.
BTW, I am thinking to change "chunkmode" into "innermode" but I am not sure. The idea is that there is always an outermode which I call basemode and withing basemode are chunks of code in other language. This is why I call them chunkmodes. Any ideas on this?
I plan to write a glossary of all the terms used, but don't want to do that till all the names are settled.
I went and looked at the code, and they aren't defined by cloning...
All objects (pm-config, pm-basemode, pm-chunkmode) that are defined at run-time are instantiated through cloning.
(FYI, installing Emacs 24 is going to disrupt my system pretty significantly, so I'm not ready to do it until I know I have a few hours to get problems sorted out, and I can actually try out polymode. But what it means for now is that I can't use any of the emacs documentation tools like C-h f.)
[...]
Emacs 24 brings a lot of new stuff, like eieio and package manager. Earlier you switch is the better IMO.
What about having a buffer-local variable so that I have multiple files using the same poly-noweb-mode, but each individual file has its own mode for code chunks?
The question is, do we really want such a command? How useful is it? Do you often change the mode of the chunk?
I have some files that contain a mix of SML code chunks and Scheme code chunks, or a mix of C code chunks and Scheme code chunks. When I edit such a file it is essential that I be able to set the mode correctly for the chunk I am editing.
What is the use of it? How does the weaver recognize the modes? As far as I know noweb is not designed for this user pattern. Thus polymode by design discourages such use by enforcing 'pm-config-one' class. If you need different modes in noweb chunks then you should extend noweb syntax to specify the mode per chunk like <<name, mode = "sml">>=. Then use pm-config-multi-auto instead of pm-config-one class to define 'poly-noweb-auto-mode'. This will make noweb similar to how markdown and org-mode work by detecting the mode of the chunk automatically.
If you think this pattern is common I can easily add such a polymode.
Vitalie
OK, basic questions: in the world of ideas, what is a polymode? What is a submode?
These questions are actually addressed in the dev doc... I agree that the docs are not crystal clear at this stage.
Once I think I understand, I will see if I can help with that.
Here's an example that works with the old noweb-mode (and with Dave Love's version):
% -- mode: Noweb; noweb-code-mode: fundamental-mode; tab-width: 4; c-indent-level: 4; c-basic-offset: 4 ; tex-main-file: book.nw --
This should eventually work as expected. It wasn't the priority so far.
All right. If you want me to do anything, I have to have this support. I'm willing to try to build it, but I will need a sketch to start with.
I'm afraid that I see some details but I do not grasp the big picture. All I'm getting is that a polymode is built from a thing called pm-config/noweb+R, and that there is another thing (pm-config/noweb+R) which points to the first thing? What is the purpose of having two things? What is the name of the kind of thing? That is, what kind of thing is
pm-chunk/noweb
? What about pm-config/noweb?No, polymode is not "built from pm-config/noweb+R" it is represented by an object cloned from pm-config/noweb+R and stored in pm/config local variable. Most of the methods in polymode-methods.el are then dispatched on this config object.
As a user, I have no idea about these methods. And this isn't the documentation I'm loooking for.
I've placed a first cut at draft documentation in the readme.md at
https://github.com/nrnrnr/polymode#high-level-view
which is a fork of yours. Please have a look and tell me what you think.
Also, what should be the name for "the major mode that a polymode mimics?" The documentation needs to talk about this a lot, so it needs a name.
The rest of the objects are dispatched on submode objects that represent the innermodes of the buffer.
I don't understand why there are 'submodes' and why they are distinct from 'polymodes'. Who needs to know about this distinction? Users? All developers? Some developers?
Please note that in the current doco, the introduction of submodes is tautological:
Submodes (basemodes and chunkmodes) are objects that encapsulate functionality of the polymode's submodes.
Maybe 'submode' is a term of art in the emacs world? I'm not finding it in the manual except for a few special cases.
Another "things" represent submodes. The base mode (latex) and the chunkmode (R in this case).
Why should base mode and chunkmode have different status?
BTW, I am thinking to change "chunkmode" into "innermode" but I am not sure. The idea is that there is always an outermode which I call basemode and withing basemode are chunks of code in other language. This is why I call them chunkmodes. Any ideas on this?
Yes, but I can speak only based on my experience with noweb:
I would prefer a design that has a single concept, rather than these two concepts.
(FYI, installing Emacs 24 is going to disrupt my system pretty significantly, so I'm not ready to do it until I know I have a few hours to get problems sorted out, and I can actually try out polymode....)
Emacs 24 brings a lot of new stuff, like eieio and package manager. Earlier you switch is the better IMO.
Many years of painful experience has taught me that the benefits of maintaining a consistent Debian installation outweigh the benefits of upgrading any one package. I am willing to make an exception if I get real Emacs support for noweb out of the deal. I'm not willing to make an exception on speculation or just because Emacs 24 is better.
I have some files that contain a mix of SML code chunks and Scheme code chunks, or a mix of C code chunks and Scheme code chunks. When I edit such a file it is essential that I be able to set the mode correctly for the chunk I am editing.
What is the use of it?
Programs written in multiple languages.
How does the weaver recognize the modes?
For my applications, it is rarely useful for the weaver to recognize the modes. In these rare cases, noweb captures some metadata that gives source locations of various parts. For example, not to index C identifiers in Scheme code.
As far as I know noweb is not designed for this user pattern.
As the author and designer of noweb, I can say definitively that noweb is designed for exactly this user pattern.
Thus polymode by design discourages such use by enforcing 'pm-config-one' class.
I have no idea what this means.
At any given moment I am certainly willing to pretend that all noweb code chunks should be associated with the same emacs mode (polymode). But I do need to be able to change that mode dynamically.
If you need different modes in noweb chunks then you should extend noweb syntax to specify the mode per chunk like <<name, mode = "sml">>=.
Absolutely not. Whatever mechanism may be used for this purpose, placing the burden on the user (and potentially polluting the chunk names in the output) is not it.
Alternative mechanism: give a regular expression that characterizes the names of all noweb root chunks that are in a given mode. For example, "^[^ \t]*.[ch]$" might characterize root chunks that should be in c-mode. Mode information can propagate to other chunks by use-def chains. And of course there would need to be a default.
Then use pm-config-multi-auto instead of pm-config-one class to define 'poly-noweb-auto-mode'. This will make noweb similar to how markdown and org-mode work by detecting the mode of the chunk automatically.
If you think this pattern is common I can easily add such a polymode.
It is not my most urgent need. My most urgent needs remain:
c-basic-offset
.Norman
Norman Ramsey on Mon, 12 May 2014 10:59:12 -0700 wrote:
[...]
I've placed a first cut at draft documentation in the readme.md at
Ok, I see now. You are missing the picture. My bad. There are at least 3 meanings of polymode and submodes: emacs function to initialize a mode, emacs' abstract notion of a mode and an eieio object that represents that mode. In the current docs these meanings are used interchangeably and this is the reason for the tautology that you have noticed.
From user prospective emacs modes and polymodes are functionally the same. This is why no mention of submodes, objects etc. should not be on the user page.
I will add precise definitions of all the terms to the dev doc once I am settled on their names. I will be back with you when that's done.
Noweb-mode confounds doc chunks and language chunks, both conceptually and at the code level. I think this is wrong. There is always a host (base) language that "contains" other language spans. I intend to use "code span" for what noweb calls chunks and reserve "chunks" for rigorously delimited spans of code in a language that is not the host language.
What is the use of it?
Programs written in multiple languages.
How does the weaver recognize the modes?
For my applications, it is rarely useful for the weaver to recognize the modes. In these rare cases, noweb captures some metadata that gives source locations of various parts. For example, not to index C identifiers in Scheme code.
Can you please provide an example of a complex noweb file with multiple languages? What do you mean by root chunk exactly?
I am lost. What applications do you mean concretely?
If you need different modes in noweb chunks then you should extend noweb syntax to specify the mode per chunk like <<name, mode = "sml">>=.
Absolutely not. Whatever mechanism may be used for this purpose, placing the burden on the user (and potentially polluting the chunk names in the output) is not it.
Alternative mechanism: give a regular expression that characterizes the names of all noweb root chunks that are in a given mode. For example, "^[^ \t]*.[ch]$" might characterize root chunks that should be in c-mode. Mode information can propagate to other chunks by use-def chains. And of course there would need to be a default.
Confounding chunk names with language indicators doesn't look like a good design to me. But I agree, it is indeed parsimonious. Also, how naming chunks with some mode-containing names and then specifying a regexp is not a "burden"?
Chunks in markdown, org-mode and web related files always have specific indicators to uniquely identify the language of the chunk. People are used to this clean idea.
That being said, if noweb specification is extended and chunk names can identify the language in a standardized way, I will add that to polymode specs immediately.
Vitalie
Hi Norman.
The dev doc is ready. I have gone through several stages of refactoring and settled down on parsimonious naming conventions. It also helped clearing my own mind. And I acknowledge that previous mode/polymode/chunkmode/submode/basemode etc. wording was quite a messup.
Thanks for all the input. It was very helpful in making things straight.
I've placed a first cut at draft documentation in the readme.md at
Ok, I see now. You are missing the picture. My bad. There are at least 3 meanings of polymode and submodes: emacs function to initialize a mode, emacs' abstract notion of a mode and an eieio object that represents that mode. In the current docs these meanings are used interchangeably and this is the reason for the tautology that you have noticed...
From user prospective emacs modes and polymodes are functionally the same. This is why no mention of submodes, objects etc. should not be on the user page.
I will add precise definitions of all the terms to the dev doc once I am settled on their names. I will be back with you when that's done.
For whatever they may be worth, here are a few suggestions:
I'm glad you're thinking hard about names---given that emacs lisp is such a dynamic language, naming becomes extra important.
Noweb-mode confounds doc chunks and language chunks, both conceptually and at the code level.
I'm not sure what you mean by 'confound' here. I think the sense you mean is 'fail to discern the differences between', but I don't think that characterization applies to noweb, which knows the difference between a documentation chunk and a code chunk. (About noweb-mode more specifically I am ignorant.)
I think this is wrong. There is always a host (base) language that "contains" other language spans. I intend to use "code span" for what noweb calls chunks and reserve "chunks" for rigorously delimited spans of code in a language that is not the host language.
On this subject I think I can speak with some authority. In the design of noweb I was extremely careful to craft an abstraction that says "a file is a sequence of chunks that appear in any order." Moreover, there are two kinds of chunks: documentation chunks and code chunks. There is no 'containment' structure and no alternation. For example, multiple documentation chunks can follow one another without any intervening code chunks, and (mutatis mutandis) the same for code chunks. This design decision was one of my best decisions, as it made all the tools (and the documentation) simpler.
I recognize that most users have a mental model of 'containment' in the way they write and sequence their documentation chunks. But none of the noweb tools are aware of or benefit from this mental model.
I hope that for noweb-mode at least, you will reconsider your intention to change the model and the terminology.
(The best paper to read about noweb and its design is at http://www.cs.tufts.edu/~nr/pubs/lpsimp.pdf. The writing makes me cringe today, but it's the best record of what I had in mind.)
For my applications, it is rarely useful for the weaver to recognize the modes. In these rare cases, noweb captures some metadata that gives source locations of various parts. For example, not to index C identifiers in Scheme code.
Can you please provide an example of a complex noweb file with multiple languages?
Yes, I have attached one. It is the main source file for the Debian package 'nbibtex'. It is a simple one, with only two languages. The more complex ones I am working on are for a book on programming languages, and I cannot distribute the source files.
What do you mean by root chunk exactly?
I mean the name of a chunk that is intended to be passed to notangle using the -R option. I see that this term does not appear in the man page, but you will see it in the article mentioned about.
I am lost. What applications do you mean concretely?
By 'application' I mean anything that produces code or documentation from a set of noweb files. In addition to the basic tangle and weave applications, I have a bunch of stuff for indexing and cross-reference. None of this stuff works on any kind of mode recognition---instead, I use noweb to relate locations in the noweb file to locations in the derived (output) files.
It may be worth saying that in 25 years, I have never used any of noweb-mode's tools weaving, tangling, navigating to chunks, and so on. When I need to weave or tangle I do it using C-c C-c make (aka M-x compile make). The one special-purpose command I have used from time to time is narrow-to-chunk.
If you need different modes in noweb chunks then you should extend noweb syntax to specify the mode per chunk like <<name, mode = "sml">>=.
Absolutely not. Whatever mechanism may be used for this purpose, placing the burden on the user (and potentially polluting the chunk names in the output) is not it.
Alternative mechanism: give a regular expression that characterizes the names of all noweb root chunks that are in a given mode. For example, "^[^ \t]*.[ch]$" might characterize root chunks that should be in c-mode. Mode information can propagate to other chunks by use-def chains. And of course there would need to be a default.
Confounding chunk names with language indicators doesn't look like a good design to me. But I agree, it is indeed parsimonious. Also, how naming chunks with some mode-containing names and then specifying a regexp is not a "burden"?
Two ways:
The markup burden on the author of the code is greatly reduced. I just looked at one sample that contains only two languages. There are 88 code chunks, of which 21 are root chunks, but there are only three root-chunk names. Putting these three names into a regexp solves the problem of identifying all 88 chunks. That's a factor of 30 reduction of effort.
Chunks in markdown, org-mode and web related files always have specific indicators to uniquely identify the language of the chunk. People are used to this clean idea.
I have no objection to this idea. Chunks in noweb are related through chains of definition and use, which link together to form a web. Information such as the mode of chunks or the language in use propagates effortlessly through the web. Literate programmers are used to this clean idea.
That being said, if noweb specification is extended and chunk names can identify the language in a standardized way, I will add that to polymode specs immediately.
I have no intention of specifying a standard for noweb---each author should retain the power to choose conventional or unconventional names in whatever way makes sense for his or her document.
May I propose that for the time being, we table the issue of multiple code modes active simultaneously? As long as I can change the code mode currently in effect, and can initialize it using a buffer-local variable, I can start working with polymode, and then it will be possible to develop a specification and tools incrementally.
Norman % -- mode: noweb; noweb-code-mode: lua-mode --
\documentclass{article} \usepackage{fullpage} \usepackage{noweb,url} \usepackage[hypertex]{hyperref} \noweboptions{smallcode}
\def\BibTeX{{\rm B\kern-.05em{\sc i\kern-.025em b}\kern-.08em T\kern-.1667em\lower.7ex\hbox{E}\kern-.125emX}}
\def\NbibTeX{{\rm N\kern-.05em{\sc bi\kern-.025em b}\kern-.08em T\kern-.1667em\lower.7ex\hbox{E}\kern-.125emX}}
\let\bibtex\BibTeX \let\nbibtex\NbibTeX
\title{A Replacement for \bibtex\
(Version
\setcounter{tocdepth}{2} %% keep TOC on one page \def\lbrace{\char123} \def\rbrace{\char125}
\begin{document} @
\maketitle
\tableofcontents
\clearpage
\section{Overview}
The code herein comprises the nbib'' package, which is a collection of tools to help authors take better advantage of \BibTeX\ data, especially when working in collaboration. The driving technology is that instead of using \BibTeX\
keys,'' which are chosen arbitrarily and idiosyncratically,
nbib builds a bibliography by searching the contents of
citations.
\begin{itemize}
\item
\texttt{nbibtex} is a drop-in replacement for \texttt{bibtex}.
Authors' \verb+\cite{+\ldots\kern-2pt \verb+}+
commands are interpreted either as classic \bibtex\ keys (for
backward compatibility) or as search commands.
Thus, if your
bibliography contains the classic paper on type inference, \texttt{nbibtex}
should find it using a citation like
\verb+\cite{damas-milner:1978}+, or
\verb+\cite{damas-milner:polymorphism}+, or perhaps even simply
\verb+\cite{damas-milner}+---\emph{regardless} of the \bibtex\ key you
may have chosen.
The same citations should also work with
your coauthors' bibliographies, even if they are keyed
differently.
\item
\texttt{nbibfind} uses the nbib search engine on the command line. If you
know you are looking for a paper by Harper and Moggi, you can just
type
\begin{verbatim}
nbibfind harper-moggi
\end{verbatim}
and see what comes out.
\item
To help you work with coauthors who don't have the nbib package,
\texttt{nbibmake}\footnote
{Not yet implemented.}
examines a {\LaTeX} document and builds a custom
\texttt{.bib} file
just for that document.
\end{itemize}
\noindent
The package is written in a combination of~C and Lua:
\begin{itemize}
\item
Because I want nbib to be able to handle bibliographies with thousands
or tens of thousands of entries,
the code to parse a \texttt{.bib} database'' is written in~C. A~computer bought in 2003 can parse over 15,000~entries per second. \item Because the search for \bibtex\ entries requires string searching on every entry, the string search is also written in~C (and uses Boyer-Moore). \item Because string manipulation is much more easily done in Lua, all the code that converts a \bibtex\ entry into printed matter is written in Lua, as is all the
driver'' code that implements various programs.
\end{itemize}
The net result is that \texttt{nbibtex} is about five times slower
than classic \texttt{bibtex}.
This slowdown is easy to observe when printing a bibliography
of several thousand entries,
but on a typical paper with fewer than fifty citations and a personal
bibliography with a thousand entries,
the pause is imperceptible.
\subsection{Compatibility}
I've made every effort to make \nbibtex\ compatible with \bibtex, so
that \nbibtex\ can be used on existing papers and should produce
the same output as \bibtex.
Regrettably, compatibility means avoiding modern treatment
of non-ASCII characters, such as are found in the ISO Latin-1
character set:
classic \bibtex\ simply treats every non-ASCII character as a letter.
\begin{itemize}
\item
It would be pleasant to try instead to set \nbibtex\ to use an
ISO~8859-1 locale, but this leads to incompatible output:
\nbibtex\ forces characters to lower case that \bibtex\ leaves alone.
<
@
\section{Parsing \texttt{.bib} files}
This section reads the \texttt{.bib} file(s).
<
<
\subsubsection {Data structures}
For convenience in keeping function prototypes uncluttered,
all state associated with reading a particular \bibtex\ file is stored
in a single [[Bibreader]] abstraction.
That state is divided into three groups:
\begin{itemize}
\item
Fields that say what file we are reading and what is our position
within that file
\item
A~buffer that holds one line of the \texttt{.bib} file currently being
scanned
\item
State accessible from Lua: an interpreter;
a list of strings from the \texttt{.bib} preamble, which is exposed to
the client;
a warning function provided by the client;
and a macro table provided by the client and updated by
[[@string]] commands
\end{itemize}
In the buffer,
the meaningful characters are in the half-open interval $[{}[[buf]],
[[lim]])$,
and we reserve space for a sentinel at~[[lim]].
The invariant is that $[[buf]] \le [[cur]] < [[lim]]$
and $[[buf]]+[[bufsize]] \ge [[lim]]+1$.
<
unsigned char buf, cur, lim; / input buffer / unsigned bufsize; / size of buffer _/ char entryclose; / character expected to close current entry */
lua_State L;
int preamble; / reference to preamble list of strings /
int warning; / reference to universal warning function /
int macros; / reference to macro table _/
} *Bibreader;
@
The [[is_id_char]] array is used to define a predicate that says
whether a character is considered part of an identifier.
<
@
\subsubsection {Scanning}
Most internal functions are devoted to some form of scanning.
The model is a bit like Icon: scanning may succeed or fail, and it has
a side effect on the state of the reader---in particular the value of
the [[cur]] pointer, and possibly also the contents of the buffer.
(Unlike Icon, there is no backtracking.)
Success or failure is nonzero or zero but is represented using type [[bool]].
<
In addition to database entries, a \texttt{.bib} file may contain
the [[comment]], [[preamble]], and [[string]] commands.
Each is implemented by a function of type [[Command]], which is
associated with the name by [[find_command]].
<
\subsubsection{Error handling}
The [[warnv]] function is used to call the warning function supplied
by the Lua client.
In addition to the reader, it takes as arguments the number of results
expected and the signature of the arguments.
(The warning function may receive any combination of string~([[s]]),
floating-point~([[f]]), and integer~([[d]]) arguments;
the [[fmt]] string gives the sequence of the arguments that follow.)
<
if (!lua_checkstack(rdr->L, 10)) assert(0); \ lua_pushboolean(rdr->L, 0); \ lua_pushfstring(rdr->L, "%s, line %d: ", rdr->filename, rdr->line_num); \ lua_pushstring(rdr->L, S); \ lua_concat(rdr->L, 2); \ } while(0)
if (!lua_checkstack(rdr->L, 10)) assert(0); \ lua_pushboolean(rdr->L, 0); \ lua_pushfstring(rdr->L, "%s, line %d: ", rdr->filename, rdr->line_num); \ lua_pushfstring(rdr->L, S, A); \ lua_concat(rdr->L, 2); \ } while(0)
/_ next: cases for Boolean functions */
@ \subsection{Reading a database entry}
Syntactically, a \texttt{.bib} file is a sequence of entries, perhaps with a few \texttt{.bib} commands thrown in. Each entry consists of an at~sign, an entry type, and, between braces or parentheses and separated by commas, a database key and a list of fields. Each field consists of a field name, an equals sign, and nonempty list of field tokens separated by [[concat_char]]s. Each field token is either a nonnegative number, a macro name (like `jan'), or a brace-balanced string delimited by either double quotes or braces. Finally, case differences are ignored for all but delimited strings and database keys, and whitespace characters and ends-of-line may appear in all reasonable places (i.e., anywhere except within entry types, database keys, field names, and macro names); furthermore, comments may appear anywhere between entries (or before the first or after the last) as long as they contain no at~signs.
This function reads a database entry and pushes it on the Lua stack.
Any commands encountered before the database entry are executed.
If no entry remains, the function returns~0.
<
if (!upto_nonwhite_getline(RDR)) \ LERR("Unexpected end of file"); \ } while(0)
static int get_bib_command_or_entry_and_process(Bibreader rdr) { unsigned char _id, *key; int keyindex; bool (_command)(Bibreader); getnext: <<scan [[rdr]] up to and past the next [[@]] sign and skip white space (or return 0)>>
id = rdr->cur; if (!scan_identifier (rdr, '{', '(', '(')) LERR("Expected an entry type"); lower_case (id, rdr->cur); /* ignore case differences */ <<if $[{}[[id]], \mbox{[[rdr->cur]]})$ points to a command, execute it and go to [[getnext]]>>
luapushlstring(rdr->L, (char ) id, rdr->cur - id); / push entry type / rdr->entry_line = rdr->line_num; ready_tok(rdr); <<scan past opening delimiter and set [[rdr->entry_close]]>> ready_tok(rdr); key = rdr->cur; <<set [[rdr->cur]] to next whitespace, comma, or possibly [[}]]>> luapushlstring(rdr->L, (char *) key, rdr->cur - key); / push database key _/ keyindex = lua_gettop(rdr->L); luanewtable(rdr->L); / push table of fields _/ ready_tok(rdr); for (; rdr->cur != rdr->entry_close; ) { <<absorb comma (breaking if followed by [[rdr->entry_close]])>> <<read a field-value pair and set it in the field table, which is on top of the Lua stack>> readytok(rdr); } rdr->cur++; / skip past close of entry / return 3; / entry type, key, table of fields _/ } @ <<scan [[rdr]] up to and past the next [[@]] sign and skip white space (or return 0)>>= if (!upto1getline(rdr, '@')) return 0; / no more entries; return nil _/ assert(rdr->cur == '@'); rdr->cur++; /* skip the @ sign / ready_tok(rdr); @ <<if $[{}[[id]], \mbox{[[rdr->cur]]})$ points to a command, execute it and go to [[getnext]]>>= command = findcommand(id, rdr->cur); if (command) { if (!command(rdr)) return 2; / command put (false, message) on Lua stack; we're done _/ goto getnext; } @ An entry is delimited either by braces or by brackets; in order to recognize the correct closing delimiter, we put it in [[rdr->entry_close]]. <<scan past opening delimiter and set [[rdr->entry_close]]>>= if (_rdr->cur == '{') rdr->entry_close = '}'; else if (_rdr->cur == '(') rdr->entry_close = ')'; else LERR("Expected entry to open with { or ("); rdr->cur++; @ I'm not quite sure why stopping at~[[}]] is conditional on the closing delimiter in this way. <<set [[rdr->cur]] to next whitespace, comma, or possibly [[}]]>>= if (rdr->entry_close == '}') { upto_white_or_1(rdr, ','); } else { upto_white_or_2(rdr, ',', '}'); } @ At this point we're at a nonwhite token that is not the closing delimiter. If it's not a comma, there's big trouble---but even if it is, the database may be using comma as a terminator, in which case a closing delimiter signals the end of the entry. <<absorb comma (breaking if followed by [[rdr->entry_close]])>>= if (_rdr->cur == ',') { rdr->cur++; ready_tok(rdr); if (_rdr->cur == rdr->entry_close) { break; } } else { LERR("Expected comma or end of entry"); } @ The syntax for a field is \emph{identifier}\texttt{=}\emph{value}. The field name is forced to lower case. <<read a field-value pair and set it in the field table, which is on top of the Lua stack>>= if (id = rdr->cur, !scan_identifier (rdr, '=', '=', '=')) LERR("Expected a field name"); lower_case(id, rdr->cur); lua_pushlstring(rdr->L, (char ) id, rdr->cur - id); / push field name / ready_tok(rdr); if (_rdr->cur != '=') LERR("Expected '=' to follow field name"); rdr->cur++; /* skip over the [['=']] */ ready_tok(rdr); if (!scan_and_push_the_field_value(rdr, keyindex)) return 2; strip_leading_and_trailing_space(rdr->L); <<if field is not already set, set it; otherwise warn>> @ Official \bibtex\ does not permit duplicate entries for a single field. But in entries on the net, you see lots of such duplicates in such unofficial fields as \texttt{reffrom}. Because classic \bibtex\ doesn't report errors on fields that aren't advertised by the \texttt{.bst} file, we don't want to just blat out a whole bunch of warning messages. So instead we dump the problem on the warning function provided by the Lua client.
We therefore can't simply set the field in the field table: we first look it up, and if it is nil, we set it; otherwise we warn. <<if field is not already set, set it; otherwise warn>>= luapushvalue(rdr->L, -2); /* push key / lua_gettable(rdr->L, -4); if (lua_isnil(rdr->L, -1)) { lua_pop(rdr->L, 1); lua_settable(rdr->L, -3); } else { luapop(rdr->L, 1); / off comes old value / warnv(rdr, 0, "ssdsss", / tag, file, line, cite-key, field, newvalue _/ "extra field", rdr->filename, rdr->line_num, lua_tostring(rdr->L, keyindex), lua_tostring(rdr->L, -2), lua_tostring(rdr->L, -1)); luapop(rdr->L, 2); / off come key and new value */ } @ \subsection{Scanning functions}
\subsubsection{Scanning functions for fields}
@
While scanning fields, we are not operating in a toplevel function, so
the error handling for [[ready_tok]] needs to be a bit different.
<
if (!upto_nonwhite_getline(RDR)) \
LERRB("Unexpected end of file"); \
} while(0)
@
Each field value is accumulated into a [[luaL_Buffer]] from the Lua
auxiliary library.
The buffer is always called~[[b]];
for conciseness, we use the macro [[copy_char]] to add a character to
it.
<
@
A field value is a sequence of one or more tokens separated by a
[concat_char].
A~precondition for calling [[scan_and_push_the_field_value]] is that
[[rdr]] is pointing at a nonwhite character.
<
luaL_checkstack(rdr->L, 10, "Not enough Lua stack to parse bibtex database");
luaL_buffinit(rdr->L, &field);
for (;;) {
if (!scan_and_buffer_a_field_token(rdr, key, &field))
return 0;
ready_tok(rdr); /* cur now points to [[concatchar]] or end of field /
if (_rdr->cur != concat_char) break;
else { rdr->cur++; ready_tok(rdr); }
}
luaL_pushresult(&field);
return 1;
}
@ Because [[ready_tok]] can [[return]] in case of error, we can't write
\begin{quote}
[[for(; _rdr->cur == concat_char; rdr->cur++, ready_tok(rdr))]].
\end{quote}
@
A field token is either a nonnegative number, a macro name (like
`jan'), or a brace-balanced string delimited by either double quotes
or braces.
Thus there are four possibilities for the first character
of the field token: If it's a left brace or a double quote, the
token (with balanced braces, up to the matchin closing delimiter) is
a string; if it's a digit, the token is a number; if it's anything
else, the token is a macro name (and should thus have been defined by
either the \texttt{.bst}-file's \texttt{macro} command or the \texttt{.bib}-file's
\texttt{string} command). This function returns [[false]] if there was a
serious syntax error.
<
The original \bibtex\ tries to optimize the common case of a field with
no internal braces; I~don't.
A~precondition for calling this function is that [[rdr->cur]] point at
the opening delimiter.
Whitespace is compressed to a single space character.
<
rdr->cur++; /* scan past left delimiter _/
rdr->lim = ' ';
if (isspace(_rdr->cur)) {
copy_char(' ');
ready_tok(rdr);
}
for (;;) {
p = rdr->cur;
upto_white_or3(rdr, '}', '{', close);
cur = rdr->cur;
for ( ; p < cur; p++) /* copy nonwhite, nonbrace characters /
copy_char(_p);
_rdr->lim = ' ';
c = cur; / will be whitespace if at end of line /
<<depending on [[c]], return or adjust [[braces]] and continue>>
}
}
@
Beastly complicated:
\begin{itemize}
\item
Space is compressed and scanned past.
\item
A closing delimiter ends the scan at brace level~0 and otherwise is
buffered.
\item
Braces adjust the [[braces]] count.
\end{itemize}
<<depending on [[c]], return or adjust [[braces]] and continue>>=
if (isspace(c)) {
copy_char(' ');
ready_tok(rdr);
} else {
rdr->cur++;
if (c == close) {
if (braces == 0) {
luaL_pushresult(b);
return 1;
} else {
copy_char(c);
if (c == '}')
braces--;
}
} else if (c == '{') {
braces++;
copy_char(c);
} else {
assert(c == '}');
if (braces > 0) {
braces--;
copy_char(c);
} else {
luaLpushresult(b); / restore invariant _/
LERRB("Unexpected '}'");
}
}
}
@
\subsubsection {Low-level scanning functions}
Scan the reader up to the character requested or end of line;
fails if not found.
<
This procedure scans for an identifier, stopping at the first
[[illegal_id_char]], or stopping at the first character if it's
[[numeric]]. It sets the global variable [[scan_result]] to [[id_null]] if
the identifier is null, else to [[white_adjacent]] if it ended at a
whitespace character or an end-of-line, else to
[[specified_char_adjacent]] if it ended at one of [[char1]] or [[char2]] or
[[char3]], else to [[other_char_adjacent]] if it ended at a nonspecified,
nonwhitespace [[illegal_id_char]]. By convention, when some calling
code really wants just one or two ``specified'' characters, it merely
repeats one of the characters.
<
orig = p = rdr->cur;
if (!isdigit(p)) {
/ scan until end-of-line or an [[illegal_idchar]] /
rdr->lim = ' '; / an illegal id character and also white space /
while (is_id_char[_p])
p++;
}
c = _p;
if (p > rdr->cur && (isspace(c) || c == c1 || c == c2 || c == c3)) {
rdr->cur = p;
return 1;
} else {
return 0;
}
}
@
This function scans for a nonnegative integer, stopping at the first
nondigit; it writes the resulting integer through [[np]].
It returns
[[true]] if the token was a legal nonnegative integer (i.e., consisted
of one or more digits).
<
On encountering an [[@]]\emph{identifier}, we ask if the
\emph{identifier} stands for a command and if so, return that command.
<
switch(_p) {
case 'c' : if (match("comment")) return do_comment; else break;
case 'p' : if (match("preamble")) return do_preamble; else break;
case 's' : if (match("string")) return do_string; else break;
}
return (Command)0;
}
@
%% \webindexsort{database-file commands}{\quad \texttt{comment}}
The \texttt{comment} command is implemented for SCRIBE compatibility. It's
not really needed because \BibTeX\ treats (flushes) everything not
within an entry as a comment anyway.
<
A \texttt{preamble} command has either braces or parentheses as outer delimiters. Inside is the preamble string, which has the same syntax as a field value: a nonempty list of field tokens separated by [[concat_char]]s. There are three types of field tokens---nonnegative numbers, macro names, and delimited strings.
This module does all the scanning (that's not subcontracted), but the
\texttt{.bib}-specific scanning function
[[scan_and_push_the_field_value_and_eat_white]] actually stores the
value.
<
The \texttt{string} command does mostly the same thing as the
\texttt{.bst}-file's \texttt{macro} command (but the syntax is different and the
\texttt{string} command compresses white space). In fact, later in this
program, the term macro'' refers to either a \texttt{.bst}
macro'' or a
\texttt{.bib} ``string'' (when it's clear from the context that it's not
a \texttt{WEB} macro).
A \texttt{string} command has either braces or parentheses as outer
delimiters. Inside is the string's name (it must be a legal
identifier, and case differences are ignored---all upper-case letters
are converted to lower case), then an equals sign, and the string's
definition, which has the same syntax as a field value: a nonempty
list of field tokens separated by [[concat_char]]s. There are three
types of field tokens---nonnegative numbers, macro names, and
delimited strings.
<
First, we define Lua access to a reader.
<
static Bibreader checkreader(lua_State L, int index) {
return luaL_checkudata(L, index, "bibtex.reader");
}
@
The reader's [[__index]] metamethod provides access to the
[[entry_line]] and [[preamble]] values as if they were fields of the
Lua table.
It also provides access to the [[next]] and [[close]] methods of the
reader object.
<
To create a reader, we call \begin{quote} \texttt{openreader(\nt{filename}, \optional{\nt{macro-table}, \optional{\nt{warn-function}}})} \end{quote}
The warning function will be called in one of the following ways: \begin{itemize} \item warn([["extra field"]], \emph{file}, \emph{line}, \emph{citation-key}, \emph{field-name}, \emph{field-value})
Duplicate definition of a field in a single entry. \item warn([["undefined macro"]], \emph{file}, \emph{line}, \emph{citation-key}, \emph{macro-name})
Use of an undefined macro.
\end{itemize}
<
/_ filename * macro table * warning function -> reader / static int openreader(lua_State L) { const char filename = luaL_checkstring(L, 1); FILE f = fopen(filename, "r"); Bibreader rdr; if (!f) { lua_pushnil(L); lua_pushfstring(L, "Could not open file '%s'", filename); return 2; }
<<set items 2 and 3 on stack to hold macro table and optional warning function>>
rdr = lua_newuserdata(L, sizeof(*rdr)); luaL_getmetatable(L, "bibtex.reader"); lua_setmetatable(L, -2);
rdr->line_num = 0; rdr->buf = rdr->cur = rdr->lim = malloc(INBUF); rdr->bufsize = INBUF; rdr->file = f; rdr->filename = malloc(lua_strlen(L, 1)+1); assert(rdr->filename); strncpy((char *)rdr->filename, filename, lua_strlen(L, 1)+1); rdr->L = L; lua_newtable(L); rdr->preamble = luaL_ref(L, LUA_REGISTRYINDEX); lua_pushvalue(L, 2); rdr->macros = luaL_ref(L, LUA_REGISTRYINDEX); lua_pushvalue(L, 3); rdr->warning = luaL_ref(L, LUA_REGISTRYINDEX); return 1; } @ <<set items 2 and 3 on stack to hold macro table and optional warning function>>= if (lua_type(L, 2) == LUA_TNONE) lua_newtable(L);
if (lua_type(L, 3) == LUA_TNONE)
lua_pushnil(L);
else if (!lua_isfunction(L, 3))
luaL_error(L, "Warning value to bibtex.open is not a function");
@
Reader method [[next_entry]] takes no parameters.
On success it returns a triple (\emph{type}, \emph{key},
\emph{field-table}).
On error it returns (\texttt{false}, \emph{message}).
On end of file it returns nothing.
<
@
Closing a reader recovers its resources;
the [[file]] field of a closed reader is [[NULL]].
<
@
To help implement the call to the warning function, we have [[warnv]].
If there is no warning function, we return the nubmer of nils specified by [[nres]].
<
lua_rawgeti(rdr->L, LUA_REGISTRYINDEX, rdr->warning);
if (lua_isnil(rdr->L, -1)) {
lua_pop(rdr->L, 1);
while (nres-- > 0)
lua_pushnil(rdr->L);
} else {
va_start(vl, fmt);
for (p = fmt; _p; p++)
switch (_p) {
case 'f': lua_pushnumber(rdr->L, va_arg(vl, double)); break;
case 'd': lua_pushnumber(rdr->L, va_arg(vl, int)); break;
case 's': {
const char _s = va_arg(vl, char );
if (s == NULL) lua_pushnil(rdr->L);
else lua_pushstring(rdr->L, s);
break;
}
default: luaL_error(rdr->L, "invalid parameter type %c", p);
}
lua_call(rdr->L, p - fmt, nres);
va_end(vl);
}
}
@
Here's where the library is initialized.
This is the only exported function in the whole file.
<
luaL_register(L, "bibtex", bibtexlib); <<initialize the [[is_id_char]] table>> return 1; } @ In an identifier, we can accept any printing character except the ones listed in the [[nonids]] string. <<initialize the [[is_id_char]] table>>= { unsigned c; static unsigned char nonids = (unsigned char )"\"#%'(),={} \t\n\f"; unsigned char *p;
for (c = 0; c <= 0377; c++) is_id_char[c] = 1; for (c = 0; c <= 037; c++) is_id_char[c] = 0; for (p = nonids; _p; p++) is_id_char[_p] = 0; } @ \subsection{Main function for the nbib commands}
This code will is the standalone main function for all the nbib commands.
\nextchunklabel{c-main}
<
extern int luaopen_bibtex(lua_State L); extern int luaopen_boyer_moore (lua_State L);
int main (int argc, char _argv[]) { int i, rc; lua_State *L = luaLnewstate(); static const char files[] = { SHARE "/bibtex.lua", SHARE "/natbib.nbs" };
OPEN(base); OPEN(table); OPEN(io); OPEN(package); OPEN(string); OPEN(bibtex); OPEN(boyer_moore);
for (i = 0; i < sizeof(files)/sizeof(files[0]); i++) { if (luaL_dofile(L, files[i])) { fprintf(stderr, "%s: error loading configuration file %s\n", argv[0], files[i]); exit(2); } } lua_pushstring(L, "bibtex"); lua_gettable(L, LUA_GLOBALSINDEX); lua_pushstring(L, "main"); lua_gettable(L, -2); lua_newtable(L); for (i = 0; i < argc; i++) { lua_pushnumber(L, i); lua_pushstring(L, argv[i]); lua_settable(L, -3); } rc = lua_pcall(L, 1, 0, 0); if (rc) { fprintf(stderr, "Call failed: %s\n", lua_tostring(L, -1)); lua_pop(L, 1); } lua_close(L); return rc; } @ \section{Implementation of \texttt{nbibtex}}
From here out, everything is written in Lua (\url{http://www.lua.org}).
The main module is [[bibtex]], and style-file support is in the
submodule [[bibtex.bst]].
Each has a [[doc]] submodule, which is intended as machine-readable
documentation.
<
local config = config or { } --- may be defined by config process
local workaround = { badbibs = true, --- don't look at bad .bib files that come with teTeX } local bst = { } bibtex.bst = bst
bibtex.doc = { } bibtex.bst.doc = { }
bibtex.doc.bst = '# table of functions used to write style files'
@
Not much code is executed during startup, so the main issue is to
manage declaration before use.
I~have a few forward declarations in
[[<utility'' functions being declared before
exported'' ones.
<
return bibtex
@
The Lua code relies on the C~code.
How we get the C~code depends on how
\texttt{bibtex.lua} is used; there are two alternatives:
\begin{itemize}
\item
In the distribution, \texttt{bibtex.lua} is loaded by the C~code in
chunk~\subpageref{c-main}, which defines the [[bibtex]] module.
\item
For standalone testing purposes, \texttt{bibtex.lua} can be loaded
directly into an
interactive Lua interpreter, in which case it loads the [[bibtex]]
module as a shared library.
\end{itemize}
<<if not already present, load the C code for the [[bibtex]] module>>=
if not bibtex then
local nbib = require 'nbib-bibtex'
bibtex = nbib
end
@
\subsection{Error handling, warning messages, and logging}
<
Like classic \bibtex, \nbibtex\ typically warns only about entries
that are actually used.
This functionality is implemented by function [[hold_warning]], which
keeps warnings on ice until they are either returned by
[[held_warnings]] or thrown away by [[drop_warning]].
The function [[emit_warning]] emits a warning message eagerly when
called;
it is used to issue warnings about entries we actually use, or if the
[[-strict]] option is given, to issue every warning.
<
local extra_ok = { reffrom = true } -- set of fields about which we should not warn of duplicates
do local warnfuns = { } warnfuns["extra field"] = function(file, line, cite, field, newvalue) if not extra_ok[field] then bibwarnf("Warning--I'm ignoring %s's extra \"%s\" field\n--line %d of file %s\n", cite, field, line, file) end end
warnfuns["undefined macro"] = function(file, line, cite, macro) bibwarnf("Warning--string name \"%s\" is undefined\n--line %d of file %s\n", macro, line, file) end
function emit_warning(tag, ...) return assert(warnfuns[tag])(...) end
local held
function hold_warning(...)
held = held or { }
table.insert(held, { ... })
end
function held_warnings()
local h = held
held = nil
return h
end
function drop_warnings()
held = nil
end
end
@
\subsection{Miscellany}
All this stuff is dubious.
<
bibtex.entries = entries bibtex.doc.entries = 'reader -> iterator # generate entries' @ \subsection{Internal documentation}
We attempt to document everything!
<
Here is the documentation for what's defined in C~code:
<
Actually, the same main function does for both \texttt{nbibtex} and
\texttt{nbibfind}; depending on how the program is called, it
delegates to [[bibtex.bibtex]] or [[bibtex.run_find]].
<
bibtex.doc.bibtex = 'string list -> unit # main program for nbibtex' function bibtex.bibtex(argv) <<set bibtex options from [[argv]]>> if table.getn(argv) < 1 then bibfatalf('Usage: %s [-permissive|-strict|...] filename[.aux] [bibfile...]', argv[0]) end local auxname = table.remove(argv, 1) local basename = string.gsub(string.gsub(auxname, '%.aux$', ''), '%.$', '') auxname = basename .. '.aux' local bblname = output_name or (basename .. '.bbl') local blgname = basename .. (output_name and '.nlg' or '.blg') local blg = open(blgname, 'w')
-- Here's what we accumulate by reading .aux files: local bibstyle -- the bibliography style local bibfiles = { } -- list of files named in order of file local citekeys = { } -- list of citation keys from .aux -- (in order seen, mixed case, no duplicates) local citedstar = false -- .tex contains \cite{} or \nocite{_}
<<using file [[auxname]], set [[bibstyle]], [[citekeys]], and [[bibfiles]]>>
if table.getn(argv) > 0 then -- override the bibfiles listed in the .aux file
bibfiles = argv
end
<<validate contents of [[bibstyle]], [[citekeys]], and [[bibfiles]]>>
<<from [[bibstyle]], [[citekeys]], and [[bibfiles]], compute and emit the list of entries>>
blg:close()
end
@
Options are straightforward.
<<set bibtex options from [[argv]]>>=
while table.getn(argv) > 0 and find(argv[1], '^%-') do
if argv[1] == '-terse' then
-- do nothing
elseif argv[1] == '-permissive' then
permissive = true
elseif argv[1] == '-strict' then
strict = true
elseif argv[1] == '-min-crossrefs' and find(argv[2], '^%d+$') then
mincrossrefs = assert(tonumber(argv[2]))
table.remove(argv, 1)
elseif string.find(argv[1], '^%-min%-crossrefs=(%d+)$') then
local , _, n = string.find(argv[1], '^%-min%-crossrefs=(%d+)$')
min_crossrefs = assert(tonumber(n))
elseif string.find(argv[1], '^%-min%-crossrefs') then
biberrorf("Ill-formed option %s", argv[1])
elseif argv[1] == '-o' then
output_name = assert(argv[2])
table.remove(argv, 1)
elseif argv[1] == '-bib' then
bib_out = true
elseif argv[1] == '-help' then
help()
elseif argv[1] == '-version' then
printf("nbibtex version
Options: -bib write output as BibTeX source -help display this help and exit -o FILE write output to FILE (- for stdout) -min-crossrefs=NUMBER include item after NUMBER cross-refs; default 2 -permissive allow missing bibfiles and (some) duplicate entries -strict complain about any ill-formed entry we see -version output version information and exit
Home page at http://www.eecs.harvard.edu/~nr/nbibtex. Email bug reports to nr@eecs.harvard.edu. ]]) os.exit(code or 0) end @ \subsection{Reading all the aux files and validating the inputs}
We pay attention to four commands: [[\@input]], [[\bibdata]], [[\bibstyle]], and [[\citation]]. <<using file [[auxname]], set [[bibstyle]], [[citekeys]], and [[bibfiles]]>>= do local commands = { } -- table of commands we recognize in .aux files local function do_nothing() end -- default for unrecognized commands setmetatable(commands, { __index = function() return do_nothing end }) <<functions for commands found in .aux files>> commands'@input' -- reads all the variables end @ <<functions for commands found in .aux files>>= do local auxopened = { } --- map filename to true/false
commands['@input'] = function (auxname) if not find(auxname, '%.aux$') then bibwarnf('Name of auxfile "%s" does not end in .aux\n', auxname) end <<mark [[auxname]] as opened (but fail if opened already)>> local aux = open(auxname, 'r') logf('Top-level aux file: %s\n', auxname) for line in aux:lines() do local , , cmd, arg = find(line, '^([%a%@]+)%s{([^%}]+)}%s$') if cmd then commandscmd end end aux:close() end end <<mark [[auxname]] as opened (but fail if opened already)>>= if auxopened[auxname] then error("File " .. auxname .. " cyclically \@input's itself") else auxopened[auxname] = true end @ \bibtex\ expects \texttt{.bib} files to be separated by commas. They are forced to lower case, should have no spaces in them, and the [[\bibdata]] command should appear exactly once. <<functions for commands found in .aux files>>= do local bibdata_seen = false
function commands.bibdata(arg)
assert(not bibdata_seen, [[LaTeX provides multiple \bibdata commands]])
bibdata_seen = true
for bib in string.gmatch(arg, '[^,]+') do
assert(not find(bib, '%s'), 'bibname from LaTeX contains whitespace')
table.insert(bibfiles, string.lower(bib))
end
end
end
@
The style should be unique, and it should be known to us.
<<functions for commands found in .aux files>>=
function commands.bibstyle(stylename)
if bibstyle then
biberrorf('Illegal, another \bibstyle command')
else
bibstyle = bibtex.style(string.lower(stylename))
if not bibstyle then
bibfatalf('There is no nbibtex style called "%s"')
end
end
end
@
We accumulated cited keys in [[citekeys]].
Keys may be duplicated, but the input should not contain two keys that
differ only in case.
<<functions for commands found in .aux files>>=
do
local keys_seen, lower_seen = { }, { } -- which keys have been seen already
function commands.citation(arg) for key in string.gmatch(arg, '[^,]+') do assert(not find(key, '%s'), 'Citation key {' .. key .. '} from LaTeX contains whitespace') if key == '*' then cited_star = true elseif not keys_seen[key] then --- duplicates are OK keys_seen[key] = true local low = string.lower(key) <<if another key with same lowercase, complain bitterly>> if not cited_star then -- no more insertions after the star table.insert(citekeys, key) -- must be key, not low, -- so that keys in .bbl match .aux end end end end end @ <<if another key with same lowercase, complain bitterly>>= if lower_seen[low] then biberrorf("Citation key '%s' inconsistent with earlier key '%s'", key, lower_seen[low]) else lower_seen[low] = key end @ After reading the variables, we do a little validation. I~can't seem to make up my mind what should be done incrementally while things are being read. <<validate contents of [[bibstyle]], [[citekeys]], and [[bibfiles]]>>= if not bibstyle then bibfatalf('No \bibliographystyle in original LaTeX') end
if table.getn(bibfiles) == 0 then bibfatalf('No .bib files specified --- no \bibliography in original LaTeX?') end
if table.getn(citekeys) == 0 and not cited_star then biberrorf('No citations in document --- empty bibliography') end
do --- check for duplicate bib entries
local i = 1
local seen = { }
while i <= table.getn(bibfiles) do
local bib = bibfiles[i]
if seen[bib] then
bibwarnf('Multiple references to bibfile "%s"', bib)
table.remove(bibfiles, i)
else
i = i + 1
end
end
end
@
\subsection{Reading the entries from all the \bibtex\ files}
These are diagnostics that might be written to a log. <<from [[bibstyle]], [[citekeys]], an
Yuck. I see that github didn't quite know what to do with the attachment. I've pushed it to https://github.com/nrnrnr/polymode/blob/master/multiple-mode-samples/nbib.nw.
Stats on that file:
noroots(1)
reports five roots: two C files, one Lua file, and two orphaned roots that are not used.Also, when I talked about regexps, I was a bit confused. What I really want is something like the auto-mode-alist
variable, only on a buffer-local basis. So for example in the sample file I can have something like this:
(("\.c$" . c-mode)
("\.lua$" . lua-mode))
Other documents might use quite different conventions; for example,
(("^transcript$" . uscheme-transcript-mode)
("additions to the initial basis of .uscheme" . scheme-mode))
and so on.
Hi Norman.
The dev doc is ready. I have gone through several stages of refactoring and settled down on parsimonious naming conventions. It also helped clearing my own mind. And I acknowledge that previous mode/polymode/chunkmode/submode/basemode etc. wording was quite a messup.
I had a quick look. How would you feel about my making an editing pass over these docs? For example, even for the devs, I think it would be helpful to begin with a short statement of the problem that polymode is indented to solve. Then, for example, each of the terms in the glossary could be related to that problem.
Norman
I am not exceptionally good with words, so I would appreciate any improvements.
What "problem" do you have in mind except the obvious one of having multiple emacs modes in the same buffer?
I would prefer to keep the docs short and to the point. The doc is already longer than I would like them to be. Interested people should go to the code and examples to figure out the rest.
Vitalie
Norman Ramsey on Fri, 30 May 2014 11:19:09 -0700 wrote:
Hi Norman.
The dev doc is ready. I have gone through several stages of refactoring and settled down on parsimonious naming conventions. It also helped clearing my own mind. And I acknowledge that previous mode/polymode/chunkmode/submode/basemode etc. wording was quite a messup.
I had a quick look. How would you feel about my making an editing pass over these docs? For example, even for the devs, I think it would be helpful to begin with a short statement of the problem that polymode is indented to solve. Then, for example, each of the terms in the glossary could be related to that problem.
Norman
— Reply to this email directly or view it on GitHub.
I am not exceptionally good with words, so I would appreciate any improvements.
What "problem" do you have in mind except the obvious one of having multiple emacs modes in the same buffer?
That's the one, with support for syntax highlighting &c.
I would prefer to keep the docs short and to the point. The doc is already longer than I would like them to be. Interested people should go to the code and examples to figure out the rest.
I'm happy with that plan.
Norman
Hi Norman,
Your example works well in my tests. The automatic chunk mode detection is there. You can do it now in a number of different ways (see poly-noweb). The buffer local variable for the default mode is also there. The general docs have been improved and the technical docs will be there once the dust of the rewrite has settled.
I am closing this one as we will be removing ess-noweb from ESS immediately after the next release later this month.
I am the author of noweb. For some time I have been aware that noweb support for Emacs users has been in a parlous state. I was very pleased to learn of your efforts to create an improved noweb-mode as part of ESS.
At present I am aware of at least three other competitors:
noweb-mode.el
originally written by Thorsten Ohl and now maintained by me. It is not something I am proud to be associated with.mmm-mode
support noweb, but that effort was never terribly effective, and themmm-mode
project appears to be just barely surviving.noweb-mode.el
based on top of hismulti-mode
package. It does some very impressive things with syntax highlighting and has other goodies, but unfortunately it is based on Emacs "indirect buffers," which are poorly documented and have proven unreliable in practice. Worse, Dave seems to have dropped off the NetI have been using ESS noweb mode on an experimental basis for a few weeks, probably amounting to no more than 10 hours in total. I am experiencing intermittent failures, not just in noweb-related functionality, but in basic Emacs commands such as
kill-line
andquery-replace
. It's possible that these issues occur because I have bothess-noweb-mode
andnoweb-mode
installed on the same system. It's possible that the issues occur because I'm using Emacs 23.4. But setting these issues aside, I am excited by your work and I find my preliminary experience is very promising. And if the issues were resolved, I'd be able to endorse your work to those of my users who use Emacs.I am wondering if you would like to join forces? The time I have available for noweb is very limited---I am rewarded very heavily for new work and not at all for improvements to noweb---but the chance to partner with somebody who has real Emacs skills is too promising to ignore. Would you be interested in working together to refactor the code so it can stand independent of ESS, and perhaps to incorporate some of Dave Love's good ideas, and squeeze out some bugs? I would love to be able to replace the old, crappy code I am distributing with something based on your work.