Include all data in xml tags in manual/opcodes

hlolli commented 6 years ago

Problem: all the xml files inside manual/opcodes are unparsable by all xml parser I've tried in java, clojure and javascript. Reasons are twofold, ampersand sign & causes parse error, but that's easily fixable on the user side just to string replace them.

The main problem are the synopsis tags

<synopsis>ares <command>alpass</command> asig, xrvt, ilpt [, iskip] [, insmps]</synopsis>

in this example, ares and asig, xrvt, ilpt [, iskip] [, insmps] are ignored because they are not tagged. It should look somehow like this

<synopsis><out>ares</out><command>alpass</command><in>asig, xrvt, ilpt [, iskip] [, insmps]</in></synopsis>

Then docbook parser needs change as well I assume. But it should be easy to add these tags with a good regexp search over all these files.

rorywalsh commented 6 years ago

This might mess up some frontends that parse the manual manually for descriptions of opcodes. I do this in Cabbage, but I'm not against this change. It would probably make things easier in the long term.

On 8 December 2017 at 10:36, Hlöðver Sigurðsson notifications@github.com wrote:

Problem: all the xml files inside manual/opcodes are unparsable by all xml parser I've tried in java, clojure and javascript. Reasons are twofold, ampersand sign & causes parse error, but that's easily fixable on the user side just to string replace them.

The main problem are the synopsis tags
ares alpass asig, xrvt, ilpt [, iskip] [, insmps]
in this example, ares and asig, xrvt, ilpt [, iskip] [, insmps] are ignored because they are not tagged. It should look somehow like this
aresalpassasig, xrvt, ilpt [, iskip] [, insmps]
Then docbook parser needs change as well I assume. But it should be easy to add these tags with a good regexp search over all these files.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/csound/manual/issues/124, or mute the thread https://github.com/notifications/unsubscribe-auth/ACkLGS0DbVHjZGiA77h9I8P4KqqIQuY9ks5s-RE6gaJpZM4Q65z- .

hlolli commented 6 years ago

Yes defenitely, I'm already parsing this for the emacs mode of csound I made, looking again for programatic access with a library Im working on for csound. it's just very painful when you can't look it up with normal xml parsers.

I want to add that we should reconsider how we handle optional orgs, how can a front-end developer parse the data from this

ares vco2 kamp, kcps [, imode] [, kpw] [, kphs] [, inyx]

when some times it looks like this

ar1 [, ar2 [, ar3 [, ... arN]]] diskin ifilcod[, kpitch[, iskiptim \
      [, iwraparound[, iformat[, iskipinit]]]]]

I'll post this on the mailing list for discussion :)

nwhetsell commented 6 years ago

For what it’s worth, it’s possible (but needlessly difficult, in my opinion) to parse the XML files in the opcodes directory.

In my view, the main complications are:

Most (possibly all) of the XML files need a header to be successfully parsed. This header must define a few character entities used by DocBook 4, and it also must define some entities used for names of Csound contributors.
Some of the XML files include a Unicode byte order mark, which must be removed.

I attempted to address these issues (and a few others, like the stray forward slash in pull request https://github.com/csound/manual/pull/125) in pull requests https://github.com/csound/manual/pull/40 and https://github.com/csound/manual/pull/41. Pull request https://github.com/csound/manual/pull/41 was merged into a docbook5 branch, which never went anywhere and is now very stale.

I’m parsing the XML files in the opcodes directory as part of this script. Here’s the XML header I’m using:

https://github.com/nwhetsell/language-csound/blob/f872e298002b3aca3a8b8decb951f7788a049972/resources/update-opcode-completions.js#L329-L384

And here’s where the parsing actually happens:

https://github.com/nwhetsell/language-csound/blob/f872e298002b3aca3a8b8decb951f7788a049972/resources/update-opcode-completions.js#L413-L442

hlolli commented 6 years ago

@nwhetsell yup this is a dirty work but I guess the best solutions. I've completed with my parser so it's not a problem for me anymore. But for the future just cross the fingers and try to keep these xml files as clean as possible. But your parser and mine look very similar, with lots of regex replaces :) but one only does this once and it does the job. So I close this as wontfix

csound / manual

Include all data in xml tags in manual/opcodes #124