code4lib / ruby-oai

a Ruby library for building OAI-PMH clients and servers
MIT License
62 stars 43 forks source link

illegal character '&' in raw string #32

Closed hapiben closed 10 years ago

hapiben commented 10 years ago

Hi,

I'm working on a custom harvester and one of the XML response that I'm getting from a base url contains illegal characters that raise a rexml exception.

illegal character '&' in raw string

The stack trace:

usr/local/rbenv/versions/1.9.3-p385/lib/ruby/1.9.1/rexml/text.rb:154:in `block in check'
/usr/local/rbenv/versions/1.9.3-p385/lib/ruby/1.9.1/rexml/text.rb:152:in `scan'
/usr/local/rbenv/versions/1.9.3-p385/lib/ruby/1.9.1/rexml/text.rb:152:in `check'
/usr/local/rbenv/versions/1.9.3-p385/lib/ruby/1.9.1/rexml/text.rb:119:in `initialize'
/usr/local/rbenv/versions/1.9.3-p385/lib/ruby/1.9.1/rexml/parsers/treeparser.rb:46:in `new'
/usr/local/rbenv/versions/1.9.3-p385/lib/ruby/1.9.1/rexml/parsers/treeparser.rb:46:in `parse'
/usr/local/rbenv/versions/1.9.3-p385/lib/ruby/1.9.1/rexml/document.rb:231:in `build'
/usr/local/rbenv/versions/1.9.3-p385/lib/ruby/1.9.1/rexml/document.rb:43:in `initialize'
/data/sites/worker.digitalnz.org/rails/releases/20140130015812/vendor/bundle/ruby/1.9.1/bundler/gems/ruby-oai-d57a23250774/lib/oai/client.rb:252:in `new'
.../rails/releases/20140130015812/vendor/bundle/ruby/1.9.1/bundler/gems/ruby-oai-d57a23250774/lib/oai/client.rb:252:in `load_document'
.../rails/releases/20140130015812/vendor/bundle/ruby/1.9.1/bundler/gems/ruby-oai-d57a23250774/lib/oai/client.rb:212:in `do_request'
.../rails/releases/20140130015812/vendor/bundle/ruby/1.9.1/bundler/gems/ruby-oai-d57a23250774/lib/oai/client.rb:218:in `block in do_resumable'
.../rails/releases/20140130015812/vendor/bundle/ruby/1.9.1/bundler/gems/ruby-oai-d57a23250774/lib/oai/client/resumable.rb:15:in `call'
.../rails/releases/20140130015812/vendor/bundle/ruby/1.9.1/bundler/gems/ruby-oai-d57a23250774/lib/oai/client/resumable.rb:15:in `each'
.../rails/releases/20140130015812/vendor/bundle/ruby/1.9.1/bundler/gems/core-7a79f2ffa4d4/lib/harvester_core/oai/paginated_collection.rb:18:in `each'

Adding the below code before this line on will fix the issue.

xml.gsub!(/&(?!(?:amp|lt|gt|quot|apos);)/, '&')

Should I create a new pull request or is there any good work around? Thank you.