h1. smd_xml
Yank bits out of any hunk of XML and reformat it to your own needs. Great for pulling feed info into your Textpattern site, for example from delicious.com.
h2. Features
h2(#install). Installation / Uninstallation
p(required). Requires PHP 5.2+ (and the SOAP extension for SOAP data feeds)
Download the plugin from either "textpattern.org":http://textpattern.org/plugins/1138/smd_xml, or the software page above, paste the code into the TXP Admin -> Plugins pane, install and enable the plugin. Visit the "forum thread":http://forum.textpattern.com/viewtopic.php?id=32718 for more info or to report on the success or otherwise of the plugin.
To remove the plugin, simply delete it from the Admin->Plugins tab.
h2. Tag: smd_xml
Place a @
bc(block).
Use the following attributes to configure the smd_xml plugin (attributes marked with a '*' are mandatory) :
h3. Data import attributes
; %data%
: The XML data source. Most of the time this will be a URL, though you could hard-code the XML data to use another TXP tag here (e.g. @
h3. Manipulation attributes
; kill_spaces : Remove all inter-tag whitespace, newlines and tabs, i.e. redundant spaces surrounding the tags in the stream. It does not touch spaces within nodes. : Although optional, this attribute is highly recommended as it has the side effect of usually speeding up the parsing process. It does, however, make the feed very difficult to read as it squishes it all up on one line. So consider turning this off if you are debugging. Options: :: 0: no, keep inter-tag spaces in the feed :: 1: yes, remove them : Default: 1 ; %transform% : Perform tranformations to the raw data stream. The transformations occur prior to the data being cached so the results are cached as well. Specify as many transformations as you like, each separated by @delim@. Each transformation is broken down into a class (type) and a list of parameters for that class, all separated by @param_delim@. You can choose from the following classes of transform: :: xsl: the second parameter is the URL of the XSL stylesheet to fetch, e.g. @transform="xsl|http://site.com/path/to/stylesheet.xslt"@. :: replace: swap portions of the document that match the (full, including delimiters) regular expression given in the second parameter with the value given in the third. If the third parameter is omitted, the matching content is removed. e.g. @transform="replace|%<xs:schema.+?<\/xs:schema>%"@. ; %format% : Alter the format of this list of fields. For each field, specify items separated by @param_delim@: The first is the name of the field you want to alter; The 2nd is the type of alteration required; The 3rd|4th|5th|.. specify how you want to alter the data. The following data types are supported: :: %case% : alter the case of the field. The items may be cumulative. Choose from four options as the third, fourth, etc parameters: ::: upper ::: lower ::: ucfirst ::: ucwords :: Example: to first convert the field to lower case then convert the first letter of each word to upper case, use @format="Country|case|lower|ucwords"@ :: %date% : takes one argument; the format string as detailed in "strftime":http://php.net/manual/en/function.strftime.php. Example: @format="pubDate|date|%d %B %Y %H:%I:%S"@ would reformat the pubDate field. Can also be used to reformat time strings. :: %escape% : escape the field so special characters are encoded as their HTML entity values. Options: ::: double_quotes: encode only double quotes (default) ::: all_quotes: encode both double and single quotes ::: no_quotes: don't encode any double or single quotes :: %fordb% : harden the field so it can be used in an SQL statement. :: %link% : convert the URL in this field to an HTML anchor hyperlink. Example: @format="cat_url|link"@ (replaces the @linkify@ attribute from the v0.2x plugin versions). :: %sanitize% : convert the field into one of three 'dumed down' formats, as specified by the third parameter. Choose from: ::: url for creating simple, valid URL strings ::: file for creating valid file names ::: url_title for making TXP-style URL titles as governed by your prefs settings :: Example: @format="Title|sanitize|url"@ to sanitize the Title field suitable for use in a web address : NOTE: format only applies to the form/container content. It is NOT applicable in @ontag@ Forms. If you wish to apply formatting to ontag attributes, or perform more complicated transformations, consider the smd_wrap plugin. ; %target_enc% : Character encoding to apply to the parsed XML data. Choose from: :: ISO-8859-1 :: US-ASCII :: UTF-8 : Default: @UTF-8@. ; %uppercase% : Set to 1 to force all XML tag names to be in upper case, thus you would have to specify @fields="NAME, DEPT"@ in order to successfully extract those fields. ; %concat% : Any duplicate nodes in the stream are usually concatenated together. If you wish to turn this feature off so that only the last tag's content remains, set @concat="0"@. : Default: 1 ; %convert% : If your data stream contains data you don't want or data that you wish to translate (for example, character entities) you can list them here. : Items are specified in pairs separated by @param_delim@; the first is the item to search for and the second is its replacement. : For example: @convert="'|'"@ would replace all occurrences of @'@ with an apostrophe character. Note that the replacements are performed on the raw stream before it is parsed and after it is cached. Also take care when decoding double quotes; this is the correct method: @convert=""|"""@ (note the double quote is escaped by putting two double quote characters in)
h3. Forms and paging attributes
; %form%
: The Txp Form with which to parse each record. You may use the plugin as a container instead if you prefer.
; %pageform%
: Optional Txp form used to specify the layout of any paging navigation and statistics such as page number, quantity of records per page, total number of records, etc. See "paging replacement tags":#pgreps.
; %pagepos%
: The position of the paging information. Options are @below@ (the default), @above@, or both of them separated by @delim@.
; %limit%
: Show this many records per page. Setting a @limit@ smaller than the total number of records switches paging on automatically so you can use the @
h4. Tag/class/formatting attributes
; %wraptag% : The HTML tag, without brackets, to surround each record you output. ; %break% : The HTML tag, without brackets, to surround each field you output. ; %class% : The CSS class name to apply to the @wraptag@.
h4. Plugin customisation
; %delim% : The delimiter to use between items in the plugin attributes. : Default: @,@ (comma). ; %param_delim% : The delimiter to use between items in XML and plugin data attributes. : Default: @|@ (pipe). ; %concat_delim% : The delimiter to use between identically-named tags in the XML data stream. : Default: @ @ (space). ; %var_prefix% : If you wish to embed an smd_xml tag inside the container of another, the replacement and paging variables might clash. Use this in one of your tags to help prevent this. : It takes up to two values separated by a comma: the first is the prefix to apply to regular replacement tags; the second is the prefix to apply to page-based replacement tags. : If only one value is specifed, the same prefix will be applied to both tag and page replacements. : Default: @, smdxml@ (i.e. no tag prefix, and @smdxml@ page prefix) ; %timeout% : The time in seconds to wait for the remote server to respond before giving up. : Default: 10 ; %transport% : (should not be needed) If you would like to force the plugin to use a particular HTTP transport mechanism to fetch your @data@ you can specify it here. Choose from: :: fsock :: curl :: soap : The @soap@ mechanism uses cURL internally so you must have that available. : Default: @curl@ (if available), else @fsock@. ; %transport_opts% : When using @soap@ transport you often need to pass additional parameters to the SOAP server. @transport_opts@ takes up to three paramaters, separated by @delim@: :: Client method: the name of a SOAP method to call :: Data: a series of name-val pairs (separated by @param_delim@) or an XML document which will be passed to the client method. e.g. @type|table|user|Bloke|pass|wilecoyote@ passes three params (type, user, and pass) with corresponding values. Note that if you want to use XML here you need to declare your intention using the @transport_config@ attribute. :: Result method: the name of a SOAP method to fetch the output. The first @param_delim@ option is the method name to call to obtain the result set, and the second is the portion of the results you want returned (e.g. @any@) ; %transport_config% : Allows you to configure how the plugin interacts with the server. The following configuration parameters are available; separate each configuration item from its predecessor using @delim@ and separate any value from its parameter name using @param_delim@ : ;; For soap: :: soap_wrap : the data you pass to the SOAP server may not be encapsulated in its own unique element. If that's the case and the server requires this, you can specify the wrapper here. For example, some servers require @soap_wrap|Request@. :: soap_delim : when retrieving multiple SOAP items, they will be concatenated together using this delimiter. Default: the same delimiter as set in @param_delim@. :: soap_type_input : can be either @nvpairs@ (the default, as shown above) or @xml@ if you are passing in a complete XML document to configure the SOAP server. When using xml input format, the plugin automatically converts the given XML document into a SOAP array. :: soap_type_output : SOAP data is normally returned as an XML document, but if for some reason the server sends back a raw SOAP array you can use this with an @xml@ parameter to ask the plugin to try and interpret the SOAP data into an XML stream for you. The success of this operation is duty bound by how well formed the resulting data is. If using this you may (probably will) also need to specify @soap_numeric_wrap@. :: soap_numeric_wrap : when converting a SOAP array back to XML, any repeating records are normally indexed starting from 0. Since raw numbers are invalid XML tag names they need to be altered somehow. By default, this is done by taking the parent class and appending a sequential number to it. If you wish to set any numeric records to a specific wrapper element, specify that element here. ;; For curl: :: binary :: cainfo :: capath :: certinfo :: crlf :: port :: proxy :: proxytunnel :: proxyuserpwd :: netrc :: sslcert :: useragent :: verifypeer :: verbose ;; For fsock: :: accept :: charset :: date :: lang :: pragma :: useragent ; %line_length% : If you are using the @fsock@ transport mechanism, the plugin grabs the XML document line by line and uses a maximum line length of 8192 characters by default. This is usually good enough because most feeds contain newlines, but some (e.g. Google Spreadsheet) don't have any newlines in them. : To successfully parse such documents you may need to increase the line length. In these situations, however, it is highly recommended to switch to @transport="curl"@ instead (if you can) because it does not have any line length restrictions. ; %hashsize% : (should not be needed) When specifying a @cache_time@ the plugin assigns a 32-character, unique reference to the current smd_xml based on your import attributes. @hashsize@ governs the mechanism for making this long reference shorter. : It comprises two numbers separated by a colon; the first is the length of the uniqe ID, the second is how many characters to skip past each time a character is chosen. For example, if the unique_reference was @0cf285879bf9d6b812539eb748fbc8f6@ then @hashsize="6:5"@ would make a 6-character unique ID using every 5th character; in other words @05f898@. If at any time, you "fall off" the end of the long string, the plugin wraps back to the beginning of the string and continues counting. : Default: @6:5@.
h3(#reps). Replacement tags
Each XML field you extract from your data stream has an equivalently-named replacement tag available so you may use it anywhere you like in your Form/container. Although the examples here don't demonstrate this, the replacement names will be prefixed by whatever you have set in your @var_prefix@ attribute.
If you chose to extract @fields="name, job_title, quality"@ you would have the following replacement tags available during the first record:
And during the second record, the same replacement tag names would refer to the following items:
Note that the attribute called @id@ that is part of the @
The @{quality}@ tag appears more than once in the example document above and is thus concatenated by default. You can influence its output using the @concat@ and @concat_delim@ attributes, e.g. using @concat_delim="|"@ would render the following replacement variable on the first record:
while @concat="0"@ would render this (i.e. the value of the last node encountered):
There are also some special statistical tags available in each record:
h3(#pgreps). Paging replacement tags
In your @pageform@ you can employ any of the following replacement tags to build up a navigation system for stepping through your XML document. Note that they all show @smdxml@ as the prefix here, but that may be changed with the @var_prefix@ attribute:
h2(#smd_xif). Tags: @
Use these container tags to determine if there is a next or previous page and take action if so. Can only be used inside @pageform@, thus all "paging replacement variables":#pgreps are available inside these tags.
bc(block).