bearded / ruby-ldap

Ruby/LDAP is an extension library for Ruby. It provides the interface to some LDAP libraries (e.g. OpenLDAP, Netscape SDK and Active Directory). The common API for application development is described in RFC1823 and is supported by Ruby/LDAP.
http://rubyforge.org/projects/ruby-ldap/
Other
66 stars 35 forks source link

:< separator in LDIF not being parsed correctly #16

Open twiz718 opened 11 years ago

twiz718 commented 11 years ago
dn: MYDNHERE
sn: Khanin
givenName: Alex
whenCreated: 20080910232037.0Z
displayName: Khanin, Alex
department: MYDEPTHERE
sAMAccountName: myloginhere
mail: MYEMAILHERE
manager: MYMGRDNHERE
thumbnailPhoto:< file:///var/tmp/ldapsearch-thumbnailPhoto-S8oDGY

This file:///var/tmp/ldapsearch-thumbnailPhoto-S8oDGY exists and is readable (contains JPEG data).

If you try to run a LDAP::LDIF.parse_file() on this ldif you get the following error:

from script/rails:6:in `(root)'irb(main):004:0> LDAP::LDIF.parse_file("/var/tmp/akhanin.ldif")
ArgumentError: invalid byte sequence in UTF-8
from org/jruby/RubyRegexp.java:1487:in `=~'
from /Users/akhanin/.rvm/gems/jruby-1.7.2@backyard/gems/jruby-ldap-0.0.2/lib/ldap/ldif.rb:105:in `unsafe_char?'
from /Users/akhanin/.rvm/gems/jruby-1.7.2@backyard/gems/jruby-ldap-0.0.2/lib/ldap/ldif.rb:323:in `parse_entry'
from org/jruby/RubyArray.java:1613:in `each'
from /Users/akhanin/.rvm/gems/jruby-1.7.2@backyard/gems/jruby-ldap-0.0.2/lib/ldap/ldif.rb:184:in `parse_entry'
from /Users/akhanin/.rvm/gems/jruby-1.7.2@backyard/gems/jruby-ldap-0.0.2/lib/ldap/ldif.rb:481:in `parse_file'
from org/jruby/RubyIO.java:1183:in `open'
from /Users/akhanin/.rvm/gems/jruby-1.7.2@backyard/gems/jruby-ldap-0.0.2/lib/ldap/ldif.rb:439:in `parse_file'
from (irb):4:in `evaluate'
from org/jruby/RubyKernel.java:1066:in `eval'
from org/jruby/RubyKernel.java:1392:in `loop'
from org/jruby/RubyKernel.java:1174:in `catch'
from org/jruby/RubyKernel.java:1174:in `catch'
from /Users/akhanin/.rvm/gems/jruby-1.7.2@backyard/gems/railties-3.2.11/lib/rails/commands/console.rb:47:in `start'
from /Users/akhanin/.rvm/gems/jruby-1.7.2@backyard/gems/railties-3.2.11/lib/rails/commands/console.rb:8:in `start'
from /Users/akhanin/.rvm/gems/jruby-1.7.2@backyard/gems/railties-3.2.11/lib/rails/commands.rb:41:in `(root)'
from org/jruby/RubyKernel.java:1027:in `require'
from script/rails:6:in `(root)'irb(main):005:0> 

When I run "file" on that thumbnailPhoto I get the following: ldapsearch-thumbnailPhoto-S8oDGY: JPEG image data, JFIF standard 1.01

Now if I remove the last line in the ldif (with the thumbnail ":<" reference), it parses just fine.

ghost commented 11 years ago

The problem is that ruby-ldap was not written to work with UTF-8, and method unsafe_char? fails when parsing a file

# return *true* if +str+ contains a character with an ASCII value > 127 or
# a NUL, LF or CR. Otherwise, *false* is returned.
#
def LDIF.unsafe_char?( str )
  # This could be written as a single regex, but this is faster.
  str =~ /^[ :]/ || str =~ /[\x00-\x1f\x7f-\xff]/
end

Wikipedia:

ASCII was incorporated into the Unicode character set as the first 128 symbols, so the ASCII characters have the same numeric codes in both sets. This allows UTF-8 to be backward compatible with ASCII, a significant advantage.

so, sequence \x00-\x1f is correct and pass, but \x7f-\xff is invalid in UTF-8 and should be replaced to another one or even few sequences, but I do not know on which exactly

Patches are welcome.