fitzscott / AirQuality

Air Quality study Utah
2 stars 0 forks source link

XML parser #2

Open fitzscott opened 10 years ago

fitzscott commented 10 years ago

At least one data source has an XML feed. I need a parser to get it into some format in HDFS. Or I can just drop it in HDFS & see if there's a Serde for XML files.

Got curl to work on URL http://www.airquality.utah.gov/aqp/xmlFeed.php?id=slc. Handy.

fitzscott commented 10 years ago

Nifty: There's already a GitHub repository for an XML SerDe for Hive:

https://github.com/dvasilen/Hive-XML-SerDe/wiki/XML-data-sources

There are also a number of references to XPath:

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+XPathUDF