Shinmera / plump

Practically Lenient and Unimpressive Markup Parser for Common Lisp
https://shinmera.github.io/plump
zlib License
119 stars 21 forks source link

Needs documentation #10

Closed MatthewRock closed 8 years ago

MatthewRock commented 8 years ago

I'm generally quite happy with the library, given that I had no opportunity to use it yet; since there's (almost) no documentation (that I could easily find), I was forced to do a bit of trying it out myself. I've used Python and C++ before to traverse XML trees, and because the tools had documentation, the process was rather pleasant. However, no such luck with plump.

Something like 20 minutes of poking around left me with this:

(defvar file #P"path/to/test.xml")

(defun traverse-tree ()
  (let
      ((plump:*tag-dispatchers* plump:*xml-tags*)
       (root-node (plump:parse file)))
    (plump:get-elements-by-tag-name root-node "test")))

Where I have an xml file like this:

<rot>
    <testParent>
        <test>
            This is text value!
        </test>
        <test>
            Another text value!
        </test>
    </testParent>
</rot>

I can now use something like (mapcar #'plump:serialize (traverse-tree)) to get xml notation of each tags, but I'm yet to find how I can extract text from them without using external tools. Could you please do some examples(probably as tests), so other users would feel more at home? I'll probably write some tutorial* when I'm done with it, but right now I'd be glad for same tutorial myself. It's nice that you provide some brief overview of some more advanced features, but some really basic code would be good too.

I am partially asking because documentation would be great, and partially because I would be glad if you could help me with understanding the basics.

And thanks or your work!

Edit: Now I've this:

(defun traverse-tree ()
  (let*
      ((plump:*tag-dispatchers* plump:*xml-tags*)
       (root-node (plump:parse file))
       (testparent-node (plump:children (aref (plump:children (aref (plump:children root-node) 0)) 1))))
    (plump:serialize (aref testparent-node 3) nil)))

So in theory I see how I could traverse the tree now(hello there, #'plump:children), but I am yet to understand how it all works... testparent-node looks like it returns some text node(text of testparent?), 2 nodes with , and 2 text nodes(text of each ?). Unfortunately each text node is empty, so again no luck in discovering access to text in there.

Edit2:

(plump:traverse (plump:parse file) #'plump:serialize :test (lambda (x) (not (plump:has-child-nodes x)))) So I found these beauties, and now I am kind-of able to get what I want, but the question about general way of parsing still remains.

Shinmera commented 8 years ago

Plump is generally a library intended to provide an interface to parse text into a DOM and then give a rather standard DOM API to look at it and work with it. These functions are documented and for the most part named after the specification's names. However, for anything beyond that, using other tools such as CLSS and lQuery is recommended as their intention is to actually provide the latter part: simple traversal and manipulation of the DOM itself. These tools are linked to in the documentation already.

I currently lack the time to write documentation of Plump to the extent that you're expecting it. However, I am open to pull requests if someone else were to step up and take the effort to do it.

I'm kind of failing to understand what exactly you want to extract out of the DOM to be honest. If you simply need the text, you can use plump:text to access that.

MatthewRock commented 8 years ago

Sorry, I might have been unclear, it was a bit late. I want to extract text from each <test> node - so I would like to get some container with "This is text value!" and "Another text value!". So theoretically, I get close by doing something by (plump:text (plump:strip (plump:parse file))), but that doesn't allow me to do something for each node when I'm there. Thanks for the function, however. Proved useful.

Shinmera commented 8 years ago

Right. In lQuery you could do that with something like so:

(lquery:$ (initialize "<test>foo</test><test>bar</test>")
  "test"
  (combine (node) (text))
  (map-apply #'(lambda (node text)
                 (format T "~&~a: ~s" node text))))

Or

(lquery:$ (initialize "<test>foo</test><test>bar</test>")
  "test"
  (map #'(lambda (node)
           (format T "~&~a: ~s" node (lquery:$ node (text) (node))))))

Obviously substituting some more useful function for the lambda form.

MatthewRock commented 8 years ago

Am I wrong, or does it not use plump at all? If so, what's the purpose of plump? Provide back-end for libs like lquery? I am a bit confused right now.

Shinmera commented 8 years ago

As I've tried to explain before, Plump does essentially only one thing: parse text into a DOM. The DOM itself is a pain to use because it's rather low-level. In order to remedy this, there's libraries on top of it: CLSS implements a CSS selector search engine for the DOM, and lQuery implements a jQuery like interface.

As such the situation is very comparable to that of Javascript where the browser provides a rudimentary DOM implementation to work with and libraries like jQuery make the handling thereof more comfortable and easy.

MatthewRock commented 8 years ago

Makes sense. Thanks.