Redesigning ly.music - Githubissues

wbsoft commented 9 years ago

Dear friends, I want to take the lead for redesigning ly.music. This issue is to continue the thread at wbsoft/frescobaldi#492, which I'll close.

ly.music currently has two shortcomings:

it is not easy to build a tree from scratch
it is not possible to create LilyPond code from scratch (from a tree).

It was initially designed to create a tree structure from an existing document, which is done in the ly.music.read module.

@crantila emphasizes on the fact that basing ly.music around xml.etree or lxml would make it much faster as not a python object is created for every single tree item, and we would have powerful search possibilities for free.

I am willing to design the ly.music tree such that it adheres to simple principles, maybe even without creating a class for every type of element.

We need the following:

build a tree from scratch and have it render a valid LilyPond document from scratch
build a tree from tokenized source (from a ly.document.Document) and have all tree items know the originating tokens (and thus source position, as every token has the position in the source file in the pos attribute)
make changes to a tree by changing attributes or adding/removing child nodes
deep copy (parts of) a tree
when making changes to a tree that was build from a tokenized source, precisely know what changes need to be done to the originating document.
fast iteration over the tree and understanding the music, time etc like it is currently. This is currently handled inside the Item class and subclasses, but it should move outside the tree node classes.
very quick find the node that is at a certain source position.
(if possible) for a tree that is created from tokenized source: update the tree from the position where source text changes without recreating the whole tree.

Functions like transpose or translate currently iterate over tokens, but they should operate on the tree in the future, and simply change the tree.

When the tree was built from a tokenized source (i.e. the originating tokens are attached to the tree nodes, while their current value has changed) a dedicated module should be able to find out which changes need to be done to the originating document to have it reflect whatever is changed in the tree. (We won't dump a new LilyPond document because that would bluntly overwrite things that a user might have done different, like whitespace or comments that may or may not be present in the tree, and also that would destroy markings or cursor positions for point-and-click when a document is overwritten inside an editor such as Frescobaldi.)

uliska commented 9 years ago

This is a great and important effort, and I'll try to join it as soon as possible. Although I have to admit that I will have to build experience with XML processing more or less from scratch.

One thing I'd like to add (which is partly OT but not completely): One of my interests in this topic is the idea to create an interface between LilyPond and the MEI format/initiative because I see that as a huge potential for Lilypond in the academic editing domain, particularly for digital edition concepts. @PeterBjuhr , Mike (Solomon) and I will present a paper at the Music Encoding Conference in May 2015, and the progress on this effort is something we'll want and have to report there. Work on any ly2mei implementations will be based on ly.music, and so this issue is sort-of "blocking" our efforts, and therefore I'm extremely interested in progress done here.

wbsoft commented 9 years ago

Note: ly.music is not strictly about XML, it's about a meaningful tree structure for LilyPond document contents.

ly.musicxml will use the ly.music to create a MusicXML tree. (Although the ultimate LilyPond→MusicXML conversion only can be done from inside LilyPond, as ly.music cannot understand music that's built by, or manipulated by, Scheme functions.)

PeterBjuhr commented 9 years ago

I'm not sure that creating the tree structure in ly.music as a xml-tree would make the conversion to other XML structures (e. g. MEI and MusicXML) easier. Initially that may seem as an advantage. But at least in the case of MusicXML I think the structure have to be rebuilt from scratch anyway.

But you emphasize other advantages, and in that case redesigning is perhaps worth the effort!

uliska commented 9 years ago

It is Christopher Antila'a strong conviction that doing this change will make processing significantly faster, and that's crucial because we don't only want to enable import/export but actually the idea of having multiple document formats as live "views" on a DOM.

wbsoft commented 9 years ago

Adding to that, I would like to get rid of the many subclasses of Item that have their own methods. So I want the ly.music tree to be a little more "dumb", but much more predictable as far as api and storage method is concerned. Smart things like computing lengths, traversing events etc should be done by separate iterator/accessor modules that have the logic.

crantila commented 9 years ago

What I believe is that an ElementTree-backed data structure will be faster and more flexible than the current all-Python implementation with node.py, especially considering the requirements Wilbert mentioned above, and that Urs and I have in mind for the LilyPond-to-MEI conversion. Though I do believe better solutions may exist, I have experience with Python's ElementTree API, so for me at least it's the fastest way to get started (no doubt that's a somewhat selfish justification!)

ElementTree will not by any means solve all our problems---we still need intelligent algorithms. However, the key to its usefulness is realizing that, as described in the Python documentation, ElementTree is simply a hierarchic data structure. Yes, it provides facilities to import and export XML documents, but that's secondary functionality. (It may be a useful way for Frescobaldi to avoid re-parsing a document every time it's opened too). Both the MEI and MusicXML converters, and any other ones we write, will probably need to significantly restructure the hierarchy of elements, as Peter rightly notes. There is simply no way to avoid that, but I hope the ElementTree API's XPath capabilities will make that task significantly easier.

Other advantages, some mentioned above:

no need to create a Python object corresponding to every type of LilyPond token; we can use an element's "tag" attribute for that
we can store arbitrary data in the attributes, including whether the element has recently been updated, what the source document's token is and the position in the source document, a document-unique identifier, and links to other elements for various reasons
very fast iteration, including iteration of elements selected by XPath query (e.g., "element's tag is 'note'," "element has @duration attribute set to half-note," or "element is child of the second 'section' element")
fast adding/removing elements with (I believe) lower memory-allocation penalty than with Python lists

We should be able to fulfill Wilbert's eighth requirement too, "for a tree that is created from tokenized source: update the tree from the position where source text changes without recreating the whole tree." If every element has its source-document position available in its attributes, we will be able to find it with a relatively simple XPath query.

By the way, every time I mention "XPath" query, I'm referring to the functionality provided by the find() and findall() methods.

Again, I'm not convinced this is the best solution. It's just one with software I know how to use, and that I believe can meet our initial requirements. I have started experimental work on a bit-by-bit migration from node.py to ElementTree. Since there seems to be interest in at least seeing how the experiment turned out, I'll try to complete it enough to upload this week so you can see.

Thanks for considering the move!

wbsoft commented 9 years ago

@crantila , thank you for your ideas and help!

wbsoft commented 9 years ago

A question. Which ElementTree implementation do we choose? I tend to favour/prefer lxml. Because of its speed, XPath completeness and the getparent() method (there are many cases I need to know the parent of an element and I think making a mapping structure from children to their parent is cumbersome everytime).

This would require the lxml package to be present but I assume that is no problem on the main platforms (Linux/OS X/Windows)?

crantila commented 9 years ago

To reduce the dependency burden, my plan was to try to use the standard library first, and only move to lxml if required. That migration should be as simple as changing the import statement.

To find a node's parent node, I was imagining an XPath query like this:

document_root.find('.//*[@id="1234"]/..')

That should select the parent of an element that has the "id" attribute set to "1234." This means we'll have to keep the document's root element in a module-accessible place, which isn't great. Maybe we could have a module-level wrapper object that offers a "find the parent" method.

What do you think? Mostly I'm just wary of building with lxml and then realizing later that the extra dependency doesn't add much.

(Also, I hope to have a small trial sample finished by the end of the weekend).

wbsoft commented 9 years ago

I think it's perfectly fine to use xml.etree. We'll probably always have the root available in some object that queries or traverses the tree. With careful coding we can al get around the getparent stuff.

wbsoft commented 9 years ago

@crantila , I am brainstorming about converting ly.music to xml.etree.

Just some ideas: We basically wil be using the normal methods (without subclassing anything) to create the xml.etree tree. We will use the tagname somewhat like the class name in current ly.music, e.g. a \relative construct is described by an element relative.

We need, for the iteration, a fast way to distinguish the main types of LilyPond content: music, markup, scheme etc. Currently, to iterate over a tree to look for music, we can just skip items that are not an instance of Music. Will we need, besides the tagname, to set e.g. a class or type attribute to 'music' for such items?

Or shall we use the tag name to described the main type. E.g. a piece of \relative or { c d e } would always be a <music> element, with an attribute (or even a child element) specifying what kind of music.

e.g. \relative c' { a b c } would translate to:

<music type="relative">
  <pitch-arg note="c" octave="1"/>  <!-- a pitch argument that is not music in itself -->
  <music type="sequentialmusic">
    <note name="a" octave="0"/>
    <note name="b" octave="0"/>
    <note name="n" octave="0"/>
  </music>
</music>

And \override Staff.InstrumentName.extra-offset = #'(1 . 2) to something like:

<override>
  <context name="Staff"/> <!-- an optional element -->
  <grob name="InstrumentName"/>
  <property-path>
    <property name="extra-offset"/>
  </property-path>
  <value>  <!-- the thing after the '=' -->
    <scheme>
      <quote>
        <pair>
          <integer value="1">
          <integer value="2">
        </pair>
      </quote>
    </scheme>
  </value>
</override>

Maybe a good departing point is that every token results in an element. That way we can easily store the token that generated elements in the token attribute. Maybe some elements, like music or markup lists, also have an ending token, which could be stored in the end_token attribute. Of course the tokens are not used when building a tree from scratch.

uliska commented 9 years ago

Just a shot in the dark: If we are going to redesign the xml representation anyway, would it be possible to do so with MEI in mind right from the start?

I mean, MEI is the most comprehensive xml based data format to encode music, so it could be a good idea to go that way?

Maybe it would even be possible to design our data format (partially) as a super/subset of MEI?

PeterBjuhr commented 9 years ago

Interesting and tempting thought, but as the tree structure in ly.music should not be a representation of the music itself but of the LilyPond code it's unclear to me how much if any adaptation to the MEI format can be done. I think we should be careful not to do any premature conversions that would hinder some of the other purposes of ly.music (and I'm not referring to conversion to MusicXML).

Nevertheless, @crantila to examine this idea and to teach us more about MEI could you give us some clues about how the examples of @wbsoft above could be represented in a MEI(-like) structure?

wbsoft commented 9 years ago

@uliska, No, ly.music should just express LilyPond code in a semantically consistent and predictable way. Basically we're just abusing the fast storage and (xpath) search facilities of ElementTree, which is implemented in C. When we have a proper design, traversing music etc will be very easy to implement, including converting to MEI, MusicXML, etc and even MIDI.

uliska commented 9 years ago

Ok. As I said, just a shot in the dark

wbsoft commented 9 years ago

thinking about the "every token results in an element" idea, this is another rendering of the \override Staff.InstrumentName.extra-offset = #'(1 . 2) command:

<override>
  <context name="Staff"/> <!-- an optional element -->
  <dot/>
  <grob name="InstrumentName"/>
  <dot/>
  <property name="extra-offset"/>
  <equalsign/>
  <scheme>
    <quote>
      <list>
        <integer value="1"/>
        <dot/>
        <integer value="2"/>
      </list>
    </quote>
  </scheme>
</override>

In this example the idea is that ly.music should just store a source tree correctly and not too much interpret it. That should be done using modules that traverse the tree.

wbsoft commented 9 years ago

\markup \bold { two words } would result in

<markup>
  <bold>
    <markuplist>
      <markupword text="two"/>
      <markupword text="words"/>
    </markuplist>
  </bold>
</markup>

PeterBjuhr commented 9 years ago

I'm starting (finally) to read something in the MEI documentation, and still somewhat influenced by @uliska 's idea I don't see why we can't have:

<note pname="a" oct="0"/>

instead of:

<note name="a" octave="0"/>

If that would be in any way helpful...

But to follow the "every token results in an element" idea, perhaps the note name and the octave token should be distinct elements!?

wbsoft commented 9 years ago

@PeterBjuhr and @uliska , indeed we could borrow common naming from MEI. pname obviously stands for pitch name. Back in 2001 I tried to design an xml format for music and named that Frescobaldi. It never got beyond the brain storming phase. Later I used the domain name for our beloved editor :-)

But to follow the "every token results in an element" idea, perhaps the note name and the octave token should be distinct elements!?

@PeterBjuhr , yes that's thinkable, although I'm not completely sure. Our "LilyXML" tree should just be there to create easily from parsed LilyPond source (very much like the current ly.music.read module). But in addition to that, it should be possible to quickly generate a meaningful tree from scratch, without looking at the api docs all the time :-)

When a LilyXML tree is built from a document, all elements should point to their position in the source. It should be possible to edit the tree and then see what changes are to be made to the source document. It is in fact not needed to store the tokens themselves. Just the position (from:to, or pos:length) needs to be set for every token.

Adding <dot/> elements in an override command does not make much sense, so I would not want that actually. But it should be possible to regenerate the syntax. We need to think carefully about what needs to be in the XML tree and what not.

When we have read a pitch from a document, it is represented there as e.g. c'. A c and a ' token. Say we edit the music tree by making the pitch one octave lower. Now when determining how to edit the originating document, we should find that we need to remove the ', or that we need to replace the (whatever original) octave with the empty string. So, in the tree, we should have a way to store the position of the original octave tokens in the document. (Because LilyPond code is human-written, we can't bluntly replace a whole document with the output of a changed tree; that would also destroy marked blocks and cursorpositions of point and click in editors such as Frescobaldi).

PeterBjuhr commented 9 years ago

In the end we should not only be able to convert LilyXML to MEI but also MEI to LilyXML. Following @uliska idea somewhat, maybe it would be more helpful to look in the import direction and see what changes need to be made in the conversion to LilyXML.

That said, I want to again emphasize that we need to be careful not to be too influenced by the MEI syntax and loose track of other important considerations...

uliska commented 9 years ago

Yes. I think one major goal is to be able to perform operations like those from the quick insert panel or applying graphical edits to the source code.

That's the most important objective as that would open up tons of possibilities.

crantila commented 9 years ago

It seems to me like there are three issues. I'll touch on each in turn.

1.) What the ly.music internal document structure should be like, in particular regarding similarity to MEI. 2.) How to retain a strong connection to the source document, in particular how to generate a source document from the internal ElementTree. 3.) Not yet being discussed: whether we're planning a migration or a rewrite-from-scratch.

Internal Document Structure

(Interesting bit above about why Wilbert originally acquired the Frescobaldi domain). I've been quietly thinking about how MEI-like Frescobaldi's ElementTree should be, since after all my personal interest in this project is in using ly.music for bi-directional converters between MEI and LilyPond. On the one hand, it seems obvious that Frescobaldi should model a LilyPond document with a LilyPond-like data structure, or else we'll probably lose something in the conversion. There's no way to represent a LilyPond \override command in MEI, for example, so let's just keep two tasks separate (making sense of a LilyPond file, and converting that to another format).

Now I've thought about it a bit more, especially reading comments in this issue. Now I think we should seriously consider parsing LilyPond documents straight into MEI. (Still not convinced it's what we should do---but let's consider it). For one, we haven't entirely decided how to represent a LilyPond document in an ElementTree, and it looks like we're basically going to have to develop an XML-capable data format to do this. Even if we retain the current node.py implementation, you've been discussing ways to regularize this internal data structure since at least September 2014, so it's taken considerable time already. But why are we using such time to define yet another data format to represent music? Couldn't we just use an existing format?

"Yes but..." is the answer to that. Yes, but even though MEI is capable of representing more notational features than MusicXML, it's still not sufficient to represent the highly-specific complexities of a LilyPond document, especially in the way Frescobaldi requires. True. So far. One of MEI's biggest selling-points over MusicXML is its extensibility, so we can actually add any required features to MEI. If we choose this approach, we only have to invent the LilyPond-specific aspects of the data internal ElementTree.

Thus Wilbert's example of \relative c' { a b c } can become something like this:

<staffDef oct="4"/>
<note pname="a"/>
<note pname="b"/>
<note pname="c"/>

We might represent a''4_\markup{\bold{NOTES}} as:

<note pname="a" oct="5" dur="4">
    <markup pos="_"><bold>NOTES</bold></markup>
</note>

And to remember source document locations, we might do this, which I think is self-explanatory:

<mei>
    <score lily.source.pathname="/home/crantila/whatever.ly">
        ...
        <note lily.source.pos="L423:68"/>
        ...
    </score>
</mei>

If we're going to experiment anyway, this might be worth trying out.

Connection to Source Documents

Just above, I proposed a very simple strategy to maintain connections to the source document. One of our goals is to generate a LilyPond document from the ly.music ElementTree, which shouldn't be particularly difficult once we decide on a data format. To know where the ElementTree elements ended up, at first we could just import the generated LilyPond document. The problem of whitespace remains. Ideally, we could import a LilyPond document to the internal format, then export the internal data into an identical LilyPond document. A more feasible, but much slower, approach to document generation (or more accurately, to modification of existing documents) will have us search for every modified token, one at a time, updating the LilyPond source file to reflect modifications to the tree. Yet even that's difficult, since our modifications to the source file may change the positions of other tokens, so we would also have to update the token positions after every modification.

I'm not sure this is the best approach, but it seems workable to convert a -\markup{ test! } b into this (NB: GitHub doesn't show all the space characters properly)

<note pname="a" lily.source.pos="L1:1">
    <space lily.source.pos="L1:2"/>
    <markup pos="-" lily.source.pos="L1:3">
        <space count="4" lily.source.pos="L1:12"/>
        <markupText lily.source.pos="L1:16">test!</markupText>
        <space lily.source.pos="L1:21"/>
    </markup>
</note>
<space count="2" lily.source.pos="L1:23"/>
<note pname="b" lily.source.pos="L1:"25/>

Tab, space, and newline characters will be thusly preserved, allowing us to (hopefully?) regenerate identical source files. Admittedly, it looks like a mess, but these files aren't meant for human consumption anyway.

Migrating or Rewriting?

My original idea was to "migrate" ly.music to the ElementTree API, bit by bit. First, Node would subclass Element. Then we'll get rid of the token-specific classes. Then we'll get rid of Node, and so on. Now it seems that, no matter which strategy we choose for internal representation, we're going to end up rewriting everything in ly.music anyway, so we should start from scratch? In any case, I'm going to hold off on preparing a sample by myself.

Let's agree on an internal data format, then maybe Wilbert can lay out a source-file skeleton, and we can work together to fill it out?

wbsoft commented 9 years ago

I might opt for a rewrite, but I really think I need to be able to express a LilyPond source file in the ly.music tree. There needs to be a transparent one to one relationship between the source file and the internal tree representation (i.e. \score { \music \layout { bla = 123 } } would translate to <score><music/><layout><assignment name="bla"><integer value="123"/></assignment></layout></score>.

There are some issues which spring into view when thinking in a Mei-like xml format:

in ly.music, one tree corresponds to a source file. The document node is capable of resolving include files and load them (in an inheritable way; this is used in ly.music.Document and extended in frescobaldi's music.py to use caching). Traversing the music transparantly loads and includes other files.
there is a distinction between the actual tree layout and the music it represents. Consider:
```
music = { c e g }
\relative { \music \music }
```
the second c is in a different octave. We only get the real music by traversing the tree (see https://github.com/wbsoft/python-ly/blob/master/ly/music/items.py#L938), not by simply accessing the nodes. ly.music handles this already (although understanding the relative pitches is not yet automatically handled while traversing), via the events method of Music objects. (But the smart logic is too much interconnected with the different objects and sprinkled through the code.)

I want the tree for the above example has some layout like this:
```
<document>
<assignment name="music">
  <music type="sequentialmusic">
    ...notes...
  </music>
</assignment>
<music type="relative">
  <music type="sequentialmusic">
    <music type="ref" ref="music"/>
    <music type="ref" ref="music"/>
  </music>
</music>
</document>
```
So I can quickly see that there is an assignment in the toplevel and a music construct. I am not yet fully convinced that having a Mei-like setup for the internal LilyPond music tree will be helpful. But I know not well enough how Mei is organized. But I want a close mapping between source file/lilypond source and the music tree.
regarding the whitespace/comments: it is not really needed to be able to reconstruct those exactly as they were in the source document. When we know the positions of the nodes we can edit the source document. When creating a new document from scratch, we will probably use default whitespace and indenting etc. It is no problem to edit a source document while tokens would change to other positions. When we have modified a tree and still have the original position of the modified elements we can use the editing features of ly.document to perform all the changes, it is no problem if that would move other tokens. While editing, all positions refer to the original position, only when an editing context ends, the changes on the source document take effect. See http://python-ly.readthedocs.org/en/latest/ly.html#module-ly.document. In https://github.com/wbsoft/python-ly/blob/master/ly/pitch/translate.py#L58 this can be seen in action.

regarding _, - and ^; I'd probably make what follows those direction tokens child elements (be it fingering, markup, articulation etc); ly.music converts { c^\markup "Hoi" } currently into:

<Document>
<MusicList u'{'>
  <Note u'c'>
  <Postfix u'^'>
    <Markup u'\\markup'>
      <String u'"'>

In xml, I would like:

<document>
<music type="sequential">
  <note>
    <direction type="^">
      <markup>
         <quotedstring>Hoi</quotedstring>
      </markup>
    </direction>
  </note>
</music>
</document>

Unfortunately, I am not yet well-versed into Mei, but very well in LilyPond. I think that probably the best way is that I design a close-as-possible meaningful tree representation for the LilyPond input, that is very easily readable and in an event-handling traversed. Then building a Mei or MusicXML document could be done by iterating the tree or traversing its events (a structure defined by LilyPond) and build the MusicXML or Mei tree from scratch. The other way around (Mei->LilyPond or even MusicXML->LilyPond) is also done by understanding that data structure and convert it to LilyPond. When I write a helper module with functions that create the most important constructs immediately, it will not be difficult to write LilyPond documents from scratch. Also the score wizard in Frescobaldi would use it. (And then we can drop ly.dom).

A final note: we will never be able to correcly export really complicated LilyPond music, because we can't export music that's built by Scheme functions. The best LilyPond→MusicXML or Mei convert would need to access the Scheme music tree data structure inside LilyPond....

PeterBjuhr commented 9 years ago

@crantila great to learn more about MEI! But even if we can extend MEI with the LilyPond and Scheme specific stuff, I must agree with @wbsoft and say that I'm not convinced that will take us far enough. As stated before, the xml tree we're after must represent the LilyPond code without compromises in the form of premature conversions to another xml syntax. Surely MEI can't be that flexible!?

PeterBjuhr commented 9 years ago

To follow up on a previous posting, to learn more about MEI and to give a little more concrete example I've tried to create a tiny MEI example (perhaps it's too simplistic to be correct!?):

<score>
  <section>
    <measure>
      <staff>
        <note pname="e" oct="5" dur="8" tuplet="i1"/>
        <note pname="d" oct="5" dur="8" tuplet="m1"/>
        <note pname="c" oct="5" dur="8" tuplet="t1"/>
      </staff>
    </measure>
  </section>
</score>

The to me nearest conceivable conversion to our proposed LilyXML would be:

<score>
  <section type="sequential">
    <new />
    <context name="Staff"/>
    <section type="sequential">
      <tuplet fraction="3/2">
        <section type="sequential">
          <note pname="e" oct="2" dur="8"/>
          <note pname="d" oct="2" dur="8"/>
          <note pname="c" oct="2" dur="8"/>
        </section>
      </tuplet>
    </section>
  </section>
</score>

And in pure LilyPond it would amount to:

\score {
  \new Staff {
    \tuplet 3/2 {
      e''8 d''8 c''8
    }
  }
}

The differences may be small, but to me they seem substantial even in this simple example. And I'm not convinced that it would be helpful to adapt to MEI naming conventions even where it can be done.

PeterBjuhr commented 9 years ago

Now I think we should seriously consider parsing LilyPond documents straight into MEI. (Still not convinced it's what we should do---but let's consider it).

Basically the problem is that the same music can be written in many different ways in LilyPond.

We've been focusing on the conclusion that this means that we need to retain a strong connection between the tree structure and how the document is actually written; which I think is reason enough.

But it could be added that this also means that exporting the music from the source code is quite complex. In fact it's impossible if you aren't willing to create yet another LilyPond/Scheme parser.

I think the intermediate step that ly.music provides should be seen as beneficial for the export instead of redundant. (And that the possible limitations in the information of the resulting music given by ly.music should be accepted.)

uliska commented 9 years ago

Am 31.01.2015 um 09:38 schrieb Peter Bjuhr:

Now I think we should seriously consider parsing LilyPond
documents straight into MEI. (Still not convinced it's what we
should do---but let's consider it).
Basically the problem is that the same music can be written in many different ways in LilyPond.

We've been focusing on the conclusion that this means that we need to retain a strong connection between the tree structure and how the document is actually written; which I think is reason enough.

I also think that we should agree with @wbsoft and make ly.music basically a tight representation of the LilyPond source document.

But it could be added that this also means that exporting the music from the source code is quite complex. In fact it's impossible if you aren't willing to create yet another LilyPond/Scheme parser.

I have to say that this is a subject I raised at the very beginning of that whole discussion ages ago (when we were still discussion "only" musicXML export). It will not be possible to create a full conversion to any other format without evaluating what Scheme does. This is not only true for "really complex" (e.g. algorithmic) music but for any Scheme-based library functions. If I write \giveMeFive { a b } in a document how should ly.music know what that does? But writing a new parser seems like a very bad idea to me. I see two approaches to complete that goal:

somehow use Guile directly or talk LilyPond into parsing relevant code for us.
Collecting information through a LilyPond run and somehow merge that information with the parsed information.

The latter seems more straightforward. By now (with my recent \annotate experience and seeing what the edition-engraver is capable of) I'm quite sure that's possible. But it is of course not viable for any "live" action. But maybe it's acceptable to have such "white spots" when we really want to have live stuff, and use LilyPond for proper "exports".

I think the intermediate step that |ly.music| provides should be seen as beneficial for the export instead of redundant. (And that the possible limitations in the information of the resulting music given by ly.music should be accepted.)

I also think that when we want to use Frescobaldi as a live editor of a ly/MEI document it is acceptable to ignore anything that MEI can't express directly.

wbsoft commented 9 years ago

I think that the uitimate exporting backend should run in LilyPond. LilyPond builds a large Scheme data structure of the music. It should be possible to write a Scheme procedure (or LilyPond backend) that can read this data structure and convert it to something else. Here is an example:

music = \new Staff {
  \set Staff.instrumentName = "Trumpet"
  c2 d e
}

\displayMusic \music

Results in:

(make-music
  'ContextSpeccedMusic
  'create-new
  #t
  'property-operations
  '()
  'context-type
  'Staff
  'element
  (make-music
    'SequentialMusic
    'elements
    (list (make-music
            'ContextSpeccedMusic
            'context-type
            'Staff
            'element
            (make-music
              'PropertySet
              'value
              "Trumpet"
              'symbol
              'instrumentName))
          (make-music
            'NoteEvent
            'duration
            (ly:make-duration 1 0 1)
            'pitch
            (ly:make-pitch -1 0 0))
          (make-music
            'NoteEvent
            'pitch
            (ly:make-pitch -1 1 0)
            'duration
            (ly:make-duration 1 0 1))
          (make-music
            'NoteEvent
            'pitch
            (ly:make-pitch -1 2 0)
            'duration
            (ly:make-duration 1 0 1)))))

Note that every LilyPond (make-music) structure has the type (or name) as first symbol (e.g. 'NoteEvent) and that the rest is an associative array. Child elements which are also music live in the 'element or 'elements key.

Another example:

music = \relative {
  c2 d^\markup { Hoi }

}

\displayMusic \music

(make-music
  'RelativeOctaveMusic
  'element
  (make-music
    'SequentialMusic
    'elements
    (list (make-music
            'NoteEvent
            'duration
            (ly:make-duration 1 0 1)
            'pitch
            (ly:make-pitch -1 0 0))
          (make-music
            'NoteEvent
            'articulations
            (list (make-music
                    'TextScriptEvent
                    'direction
                    1
                    'text
                    (markup #:line (#:simple "Hoi"))))
            'pitch
            (ly:make-pitch -1 1 0)
            'duration
            (ly:make-duration 1 0 1)))))

I think it is overkill if we add a parser for these scheme procedures to ly; it should be better to write the exporting stuff in Scheme instead. We could then run such a scheme procedure by injecting some code in the document we want to export. So the MusicXML or Mei exporter could be written in scheme and run from within LilyPond, but it would not be necessary to add it to LilyPond. It could be included and we could make small changes to a document like converting

\new ChoirStaff <<
   \new Staff <<
    etc etc
>>

to

\exportMusicXML "filename.xml" \new ChoirStaff <<
   \new Staff <<
    etc etc
>>

And then run LilyPond, which will run our exporter. This then could be done by an editor. This way we can develop and use the exporter without bugging the Lily devs to include it. Only when it is mature, we could add it to vanilla LilyPond (like the articulate stuff some years ago).

wbsoft commented 9 years ago

Building upon this, I have yet another idea. To write a small function in Scheme that maps a LilyPond music structure to XML directly. This structure (let's call it lilyxml) is then read by a different program which can convert it to anything else. The main manipulations are already done by LilyPond (like substituting variables, including files, setting durations etc.).

This lilyxml could be the base for ly.music. The lilypond music structure did not change much over the last years.

What is in element or elements become child elements. Properties that are markup need to be handled specially, because they are also tree structures.

I am willing to write such a scheme function. It might even exist already. Then we can experiment.

PeterBjuhr commented 9 years ago

Interesting thought! Will ly.music still be able to do all current stuff in terms of document manipulation?

I remember that we looked into the information \displayMusic can give us, but I don't remember it well. I think there was a conclusion that this wouldn't give us all information needed to represent the music.

Nevertheless to be able to parse that information could open up new possibilities!

uliska commented 9 years ago

This is something I just started to think about after your prevous comment :-)

The only issue I see is that this structure will only be available after compilation.

But for exporting to whatever target I think this should be a very promising approach.

This could then be made a backend so one could simply selext xml instead of pdf... which would of course skip the whole layout processing.

@crantila as an aside note wrt mei2ly: if we could write a converter from MEI to such a Scheme structure we could feed this directly into LilyPond, completely skipping the stage of LilyPond code. I have the impression this could be significantly more straightforward to be done. Maybe even in Scheme - which could lead to LilyPond reading MEI files directly.

Maybe we should move that discussion to another place.

uliska commented 9 years ago

Am 31. Januar 2015 11:46:12 MEZ, schrieb Peter Bjuhr notifications@github.com:

Interesting thought! Will ly.music still be able to do all current stuff in terms of document manipulation?

I remember that we looked into the information \displayMusic can give us, but I don't remember it well. I think there was a conclusion that this wouldn't give us all information needed to represent the music.

I think it wouldn't be plain \displayMusic but something else that is applied at a later stage of processing.

Nevertheless to be able to parse that information could open up new possibilities!

Reply to this email directly or view it on GitHub: https://github.com/wbsoft/python-ly/issues/3#issuecomment-72313021

Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.

wbsoft commented 9 years ago

Note that this idea of conversion of the LilyPond music structure to XML is already some years old. But afaics, as of yet no-one stepped in to actually create such a scheme function.

PeterBjuhr commented 9 years ago

But it could be added that this also means that exporting the music from the source code is quite complex. In fact it's impossible if you aren't willing to create yet another LilyPond/Scheme parser.

I have to say that this is a subject I raised at the very beginning of that whole discussion ages ago (when we were still discussion "only" musicXML export).

To set things into perspective, when the MusicXML exporter development started the goal was limited to being able to export only documents that had previously been imported (as a kind of exchange model). When ly.music was created and the exporter integrated with that the limitations was greatly reduced!

wbsoft commented 9 years ago

Will ly.music still be able to do all current stuff in terms of document manipulation?

I think that is possible. That is somewhat the behaviour of \displayLilyMusic which also prints out plain LilyPond source:

\version "2.18.0"

music = \relative {
  \time 4/4
  c4 d e \times 2/3 { d8 e f }
}

\displayLilyMusic \music

{ \time 4/4
  c d e \tuplet 3/2 { d8 e f } }

It looses the \relative but that is present in \displayMusic.

crantila commented 9 years ago

I'm going to push a bit farther with the internally-held MEI idea, even though my own feeling is that a compromise between the full MEI spec and the existing node.py hierarchy will produce the best results for the long term. Yet although I have my doubts, MEI is very extensible, and we just might be able to make it work for us.

There seems to be general agreement that the representation must reflect the source document quite closely. I agree with this in principle, but I also think that many of the Lily-XML samples follow the source document too closely. This obscures the document's meaning, and will make it harder to work with for any purpose. Consider { c^\markup "Hoi" }, which Wilbert suggests should be encoded as

<document>
  <music type="sequential">
    <note>
      <direction type="^">
        <markup>
           <quotedstring>Hoi</quotedstring>
        </markup>
      </direction>
    </note>
  </music>
</document>

For me, this doesn't logically represent either the source document or its meaning. In Lily-XML I would rather write:

<document>
  <music type="sequential">
    <note>
      <markup grouping="dblquo" pos="^">Hoi</markup>
    </note>
  </music>
</document>

That's still a one-to-one mapping, but it collects everything about the \markup command into a single element, which is much easier to work with. And from here it's not a big leap to a valid subset of an MEI document:

<section>
  <staff>
    <note>
      <markup grouping="dblquo" pos="^">Hoi</markup>
    </note>
  </staff>
</section>

We will have to invent the markup element.

Here's an alternative to @PeterBjuhr's MEI example, which more closely reflects the LilyPond code like this:

<mei><music><body><mdiv><score><section>
    <staff n="1">
        <layer n="1" lily.type="implied">
            <tuplet num="3" numbase="2">
                <note pname="e" oct="5" dur="8"/>
                <note pname="d" oct="5" dur="8"/>
                <note pname="c" oct="5" dur="8"/>
            </tuplet>
        </layer>
    </staff>
</section></score></mdiv></body></music></mei>

The mei, music, body, mdiv, score, and section elements are all required for valid MEI, but there's no reason we have to hold a fully-valid ElementTree internally. The layer element is also required by MEI. For us it represents a Voice context. Admittedly, adding the implied Voice takes us farther from the source document, so I invented the @lily.type attribute to indicate it's not written in the source document. Here's an example of where we could modify the MEI spec (to not require the layer element), or even just not bother to follow it exactly.

It's also possible to resolve include files and allow for context-sensitive interpretation of the same music. Consider the following LilyPond code:

music = { c e g }
\relative { \music \music }

In my LilyPond-centric MEI, this could be:

<section label="music" xml:id="55447">
    <note pname="c"/>
    <note pname="e"/>
    <note pname="g"/>
</section>
<section type="sequential" subtype="relative">
    <section xlink:show="embed" xlink:title="music" target="#55447"/>
    <section xlink:show="embed" xlink:title="music" target="#55447"/>
</section>

I'm not sure I'm using the cross-reference attributes correctly because I've never done it before, but the point is that it's possible. To me, this snippet reflects the LilyPond source quite closely and transparently, and it's almost valid MEI. (The note elements should be inside a layer inside a staff). To deal with \include, MEI does allow referring to an arbitrary URI, so we could import and parse those files as required.

@PeterBjuhr: "Basically the problem is that the same music can be written in many different ways in LilyPond." That's a difficulty for parsing both LilyPond and MEI files, but I don't think it's that big of a problem. If anything, the ambiguity of both formats suggests they're well-suited for each other.

In the end we're going to have to make some compromises, no matter how we chose to parse the source file. The only true one-to-one representation of a LilyPond file is itself. However, we all already know this: "exporting the music from the source code is... impossible if you aren't willing to create yet another LilyPond/Scheme parser." (@PeterBjuhr)

Isn't that exactly what we're trying to do? Even though we're focussing on the LilyPond syntax first, I guess I took it for granted that we were going to have to parse Scheme too. Consider a simple example: \once \override NoteHead.color = #(x11-color 'LimeGreen) e, which could be represented in my MEI-plus-ly world as:

<override once="true" grob="NoteHead" property="color" value="#12345">
    <scheme xml:id="12345">(x11-color 'LimeGreen)</scheme>
    <note pname="e"/>
</override>

As development continues, we'll convert more and more of the Scheme into MEI, in this case by adding a @color attribute to the note. Since MEI isn't even sort of good at encoding Scheme programs, we should keep all the scheme elements in the document tree, even after their meaning is known and imported correctly.

But I don't understand why you're talking about using LilyPond's Scheme output. It is the best way to know how LilyPond will parse a file, but it doesn't really work for our goals with ly.music. In particular, it renders the source file nearly unrecoverable.

crantila commented 9 years ago

But wait! If LilyPond's hierarchic Scheme code is basically what we're planning to use ElementTree for, why don't we parse LilyPond files into an ElementTree based on the same hierarchy?

So that this in LilyPond...

music = \relative { c2 d^\markup { Hoi } }

... becomes something like this in ly.music:

<make-music RelativeOctaveMusic>
    <make-music SequentialMusic>
        <make-music NoteEvent>
            <make-duration>1 0 1</make-duration>
            <make-pitch>-1 0 0</make-pitch>
        </make-music>
        <make-music NoteEvent
            <articulations>
                <make-music TextScriptEvent direction="1" text="#211">
                    <markup line><simple>Hoi</simple></markup>
                </make-music>
            </articulations>
            <make-pitch>-1 1 0</make-pitch>
            <make-duration>1 0 1</make-duration>
        </make-music>
    </make-music>
</make-music>

We could attach attributes with line numbers and character positions too. Just a thought.

crantila commented 9 years ago

Or Scheme's XML, called SXML? https://www.gnu.org/software/guile/manual/html_node/SXML.html

wbsoft commented 9 years ago

That's exactly what I am currently trying :-) We could use SXML but currently I just write xml by hand.

There are some issues: a command like \voiceOne translates to a long list of override commands. I want to be able to express such clearly documented LilyPond commands in our tree as well in one clear way. Also, I do not store user values in xml attributes, because handwritten LilyPond source could have longer constructs as an argument. A property setting could be #3 but also #(+ (someproc bla) 4).

But the first step is to have a reliable XML conversion of the LilyPond scheme music and markup structures.

Let me try and figure for some time, some premature code is in ly.xml.

PeterBjuhr commented 9 years ago

I follow the development in ly.xml with great interest! I think the idea is very promising!

But it was also very sudden, I'm still trying to straight out a few things. I'll do it here and see if there still is any questions remaining. Hopefully it can be helpful for the overall discussion:

The translation to XML now has to be triggered by an include and a LilyPond compilation.

I guess the include can be made in the background in the same way as for the layout control. And the compilation could be made in a similar way as the auto engrave when the tree structure is needed (here could be a speed issue though).

PeterBjuhr commented 9 years ago

If we take an example like this:

rhythm =
#(define-music-function (parser location p) (ly:pitch?)
  #{ \tuplet 3/2 { $p 8 8 8 }  #})

music = {
  \rhythm c'
}

the close connection with the document is lost.

It would suddenly be possible to export the music correctly, which is great. And perhaps ly.lex (and other tools not affected by the change) will be enough for the current document manipulation!?

wbsoft commented 9 years ago

Peter, actually I want both: build the xml by hand or by LilyPond. I will clearly specify the differences. But to settle on the format I'll first implement the xml mapping from within LilyPond.

PeterBjuhr commented 9 years ago

Ah, I see! So a tree structure could still be created by ly.music.read but the one created from Scheme will be the model.

I think that settles my questions!

crantila commented 9 years ago

If this is what we end up doing, it means we need to build, test, and optimize a Scheme-to-XML program, and test and optimize the existing lexer and tokenizer. Plus, a fully correct runtime parse into ly.music requires coordinating the input of both sources. Do I have this right? Or are you just building the Scheme-to-XML program in order to help determine what ly.music.read should be outputting?

wbsoft commented 9 years ago

I am experimenting (for fun also :-)

Or are you just building the Scheme-to-XML program in order to help determine what ly.music.read should be outputting?

That's indeed what I am trying first. Of course there are large differences: E.g. 'TransposedMusic in LilyPond already has been transposed. In Python-ly, it wouldn't, and the arguments to the transpose command would still be there. But I'll sort that out.

PeterBjuhr commented 9 years ago

If the road ahead for the redesign has become clearer, perhaps this issue can be closed?

crantila commented 9 years ago

Clearer, but not clear. Shouldn't we close the issue only when we're either finished, or we've created a milestone with a set of smaller issues as a "to-do" list?

@wbsoft: "I am experimenting (for fun also :-)" What is this "fun" you speak of?!

Two other ideas. The Contributor's Guide offers the LilyPond grammar as outputted "from the parser" (GNU Bison). Could we feed this into a Python parser generator?

Or, since the licences work, we could simply copy the LilyPond parser, bringing it into Python as a C extension module. Taking this to the next step, since libxml2 is a C library, is there a way to get data from LilyPond into libxml2 without going through Python first? I know LilyPond is written in C++, but it still might work.

If nobody knows whether either of these are possible, I'd be happy to investigate.

wbsoft commented 9 years ago

What is this "fun" you speak of?!

I write code for fun! :-) I am a professional musician (organist, choir conductur and carillonist). Experimenting and finding things out gives me lot of fun, but many times there is no spare time at all and I can do less. I get paid incidentally to engrave music using LilyPond, and wrote the new Dutch church hymnbook that way (www.liedboek.nl).

If the road ahead for the redesign has become clearer, perhaps this issue can be closed?

Let it open for brainstorming, etc. I am now just experimenting how creating XML from LilyPond looks. Then I'll have a look how we can formalize stuff and make clear how we can manually build such trees that are understandable as well. XML generation from within LilyPond is quite promising. Once the format settles, it might become viable for further processing by other tools without needing to add those to LilyPond or write them entirely in Scheme.

Two other ideas. The Contributor's Guide offers the LilyPond grammar as outputted "from the parser" (GNU Bison). Could we feed this into a Python parser generator?

Or, since the licences work, we could simply copy the LilyPond parser, bringing it into Python as a C extension module. Taking this to the next step, since libxml2 is a C library, is there a way to get data from LilyPond into libxml2 without going through Python first? I know LilyPond is written in C++, but it still might work.

Two very nice ideas, esp. having a LilyPond parser as Python extension module.

The ly.lex parser has some niceties, e.g. it automatically switches between html, lilypond, scheme, and can also support latex, texinfo and docbook, although those are not yet completely developed. The LilyPond parser defers scheme parsing to guile, which we also would need to support. If we also would add a guile parser, we would have al the motor power of LilyPond in a python module :-)

wbsoft commented 9 years ago

Shouldn't we close the issue only when we're either finished, or we've created a milestone with a set of smaller issues as a "to-do" list?

Yes, we should create a milestone. But currently it is not completely clear in my mind what exactly the new ly.music would become. But it will develop (my brain automatically grows ideas :-D )

crantila commented 9 years ago

Hi, I'm just wondering if there has been progress or decisions about the direction of this issue or #8 ? If it helps, I tried out my idea of using LilyPond's actual parser as a C extension module, and it seems far more complicated than hoped.

Also, I may be able to recruit someone with experience building parsers and lexers. We're both eager to start helping out!

frescobaldi / python-ly

Redesigning ly.music #3

Internal Document Structure

Connection to Source Documents

Migrating or Rewriting?