code4lib / ruby-oai

a Ruby library for building OAI-PMH clients and servers
MIT License
62 stars 43 forks source link

Strip invalid XML characters. #33

Closed hapiben closed 2 years ago

hapiben commented 10 years ago

Fixing illegal character '&' in raw string issue.

jrochkind commented 4 years ago

Hi, I know this is really old. I am now a committer/owner of this project. Are you still interested in this @hapiben ?

hapiben commented 4 years ago

Hi @jrochkind. Yes. We still have the forked version for this fix.

jrochkind commented 4 years ago

OK, I'm realizing I may be a bit out of my league evaluating this one.

I'm definitely in minimal caretaker mode, just trying to do a bit of cleanup while I'm stopping in.

Firstly, to merge, we'd need some kind of test in the PR. Doesn't need to be perfect, but at least one test that fails without the code change but passes with it. Especially with this really old creaky code without anyone still around maintaining it, we need tests to keep us safe.

Can you add a test?

Secondly though: I have some concerns about code:

a) This appears to be adding a second REXML parse, when in 'rexml' mode. Your change does a REXML parse to check for validity, but then just returns the raw string again... and then the existing code at line 253 will do a REXML parse again. It does this whether or not the XML was valid. This seems needlessly inefficient, can it be refactored to use the initial REXML parse, rescue on invalid, and only do a second parse if it needed to in order to fix invalid chars? So the extra parse is only done for cleanup from invalid input. If there was a test in here, it would make it easier to refactor, making sure the test still passed.

b) REXML in general is a very old unsupported library we don't see much in ruby for the past.... well, like 8 years honestly. This patch only effects the 'rexml' branch. Have you considered installing nokogiri and using the nokogiri branch instead? And see how nokogiri branch does with invalid input, and if it also needs a fix? Having a test, could be easily adopted to test under nokogiri mode too.

hapiben commented 4 years ago

Hi @jrochkind,

Thanks. I'll update the PR. It's been a while since I worked on the project here that uses this gem. :-)

barmintor commented 2 years ago

I've rebased/refactored and tested this in #95 - please review there.