ckruse / CFPropertyList

Read, write and manipulate both binary and XML property lists as defined by apple
MIT License
212 stars 47 forks source link

Dictionaries are nil #31

Closed paulyoung closed 10 years ago

paulyoung commented 10 years ago

Given the following plst:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>foo</key>
    <array>
        <dict>
            <key>bar</key>
            <dict>
                <key>baz</key>
                <integer>42</integer>
            </dict>
        <dict>
    </array>
</dict>
</plist>

and this ruby file:

require 'cfpropertylist'

filepath = ARGV[0]
plist = CFPropertyList::List.new(:file => filepath)
data = CFPropertyList.native_types(plist.value)

data['foo'].each do |item|
  puts item['bar'] if object.is_a? Hash
end

I get the following output:

{"bar"=>nil}

when running:

ruby test.rb path/to/file.plist
paulyoung commented 10 years ago

Here's a simpler example.

Plist:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>foo</key>
    <dict>
        <key>bar</key>
        <dict>
            <key>baz</key>
            <integer>42</integer>
        </dict>
    </dict>
</dict>
</plist>

Ruby file:

require 'cfpropertylist'

filepath = ARGV[0]
plist = CFPropertyList::List.new(:file => filepath)
data = CFPropertyList.native_types(plist.value)

puts data['foo']

Output:

{"bar"=>nil}
paulyoung commented 10 years ago

FWIW, I've tried several libraries of this type now and all have the same issue.

ckruse commented 10 years ago

I can't reproduce this problem. With this corrected XML file:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>foo</key>
    <array>
        <dict>
            <key>bar</key>
            <dict>
                <key>baz</key>
                <integer>42</integer>
            </dict>
        </dict>
    </array>
</dict>
</plist>

and this Ruby code:

require 'cfpropertylist'

filepath = 'test.xml'
plist = CFPropertyList::List.new(:file => filepath)
puts plist.value.inspect

data = CFPropertyList.native_types(plist.value)
puts data.inspect

I get this output:

#<CFPropertyList::CFDictionary:0x00000001129908 @value={"foo"=>#<CFPropertyList::CFArray:0x00000001129958 @value=[#<CFPropertyList::CFDictionary:0x00000001129980 @value={"bar"=>#<CFPropertyList::CFDictionary:0x000000011299d0 @value={"baz"=>#<CFPropertyList::CFInteger:0x00000001129a20 @value=42>}>}>]>}>
{"foo"=>[{"bar"=>{"baz"=>42}}]}

This is exactly what I was expecting: a hash with a key 'foo' containing an array with just another hash as the only value. The hash conains a key 'bar' which contains another hash. That last hash has a key 'baz' containing 42. Exactly what the XML was describing.

Can you paste the output of this Ruby code?

puts CFPropertyList.xml_parser_interface.inspect

CFPropertyList can use three different XML parsers: lubxml-ruby, nokogiri as well as REXML. It will try to load libxml since it is the fastest parser, then fall back to Nokogiri and last try out REXML (since it is the slowest). Maybe one of the parser backends has a bug?

paulyoung commented 10 years ago

I'm still experiencing the issue. Here is the exact file I'm using: https://www.dropbox.com/s/v6r9n64yi9pl6h0/Data (binary plist)

Output of puts plist.value.inspect: https://www.dropbox.com/s/gthw3484pu43o10/plist.value.inspect.rb

Output of puts data.inspect: https://www.dropbox.com/s/sdblzv9xfdmdpqn/data.inspect.rb

paulyoung commented 10 years ago

FYI - adding the .plist extension to the file allows you to view it as XML using QuickLook on OS X. I can provide the XML if needed.

ckruse commented 10 years ago

What value do you exactly expect? There is no foo value in this plist. Can you elaborate?

ckruse commented 10 years ago

XML is not needed, I can see the structure via Xcode :)

paulyoung commented 10 years ago

The foo, bar, baz code was a contrived example.

The beginning of the plist looks like this:

plist (truncated):

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>$archiver</key>
    <string>MSArchiver</string>
    <key>$objects</key>
    <array>
        <string>$null</string>
        <dict>
            <key>$class</key>
            <dict>
                <key>CF$UID</key>
                <integer>4129</integer>
            </dict>

However as you can see from the output, the value of the "$class" keys (and any other dictionaries) is always nil:

output (truncated):

{
  "$version"=>100000,
  "$objects"=>[
    "$null",
    {
      "currentPageIndex"=>0,
      "pages"=>nil,
      "do_objectID"=>nil,
      "layerStyles"=>nil,
      "images"=>nil,
      "$class"=>nil,
      "layerTextStyles"=>nil,
      "layerSymbols"=>nil
    },
    {
      "images"=>nil,
      "$class"=>nil
    }, {
      "$class"=>nil
    }
ckruse commented 10 years ago

Interesting. In XML plists, this seems to be correct:

["$null",
 {"$class"=>{"CF$UID"=>4129},
  "currentPageIndex"=>0,
  "do_objectID"=>{"CF$UID"=>4128},
  "images"=>{"CF$UID"=>2},
  "layerStyles"=>{"CF$UID"=>6},
  "layerSymbols"=>{"CF$UID"=>12},
  "layerTextStyles"=>{"CF$UID"=>16},
  "pages"=>{"CF$UID"=>20}},
 {"$class"=>{"CF$UID"=>5}, "images"=>{"CF$UID"=>3}},

while in binary plists this seems to be a problem:

["$null",
 {"currentPageIndex"=>0,
  "pages"=>nil,
  "do_objectID"=>nil,
  "layerStyles"=>nil,
  "images"=>nil,
  "$class"=>nil,
  "layerTextStyles"=>nil,
  "layerSymbols"=>nil},
 {"images"=>nil, "$class"=>nil},
 {"$class"=>nil},

This wouldn't be so interesting if the apple parser wouldn't seem to have this bug as well. Opening the XML plist in Xcode and everything is fine; opening the binary plist in Xcode and there are lots of entries which seem to be empty.

On the other hand: plutil -convert xml1 produces the correct value.

Still looking for more…

paulyoung commented 10 years ago

Thanks. Sounds like I can workaround the issue by converting the binary plist to XML.

ckruse commented 10 years ago

Seems like it, yes.

ckruse commented 10 years ago

I can limit the problem to a very basic test case:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
  <key>CF$UID</key>
  <integer>4129</integer>
</dict>
</plist>

Change this to a binary plist with plutil -convert binary1 and have a look at it via Xcode or my library. If I change the CF$UID to something different (e.g. DF$UID) it works. This is confusing…

paulyoung commented 10 years ago

Maybe CF$UID is treated differently because it's a reference to another entity in the $objects array.

ckruse commented 10 years ago

No. It is special because of the name of the key.

When I reverse engineered the plist format the C implementation of Apple had some references to object types I could neither find a meaning nor an existing value: set and uid. When I change the key name to something different, the object type is integer. When I change the object type to CF$UID the object type is uid.

It seems that Apple implemented some weird special handling for some keys.

paulyoung commented 10 years ago

In case this helps in any way: http://www.cclgroupltd.com/geek-post-nskeyedarchiver-files-what-are-they-and-how-can-i-use-them/

ckruse commented 10 years ago

Nice, this helps. Thanks.

ckruse commented 10 years ago

I pushed a new commit. Can you test and tell me if this solves your problem?

Currently I create a dictionary with just one key in binary as well, but I'm absolutely not sure if we should do it this way; IMHO both binary and XML should replace the dict by the pure UID number.

What do you think?

paulyoung commented 10 years ago

I'll try it and let you know.

I think it's important to match the structure of the XML. When there's a dictionary with the key "CF$UID" I know that it's value is the index of another element in the array.

If it was just an integer I wouldn't be able to get the element it's referring to.

paulyoung commented 10 years ago

I want to try what's on master so I updated my Gemfile to this:

source 'https://rubygems.org'

# gem 'CFPropertyList', '~> 2.2.7'
gem 'CFPropertyList', :git => 'git@github.com:ckruse/CFPropertyList.git'

and got this output from bundle install:

Updating git@github.com:ckruse/CFPropertyList.git
Fetching gem metadata from https://rubygems.org/...
Resolving dependencies...
Could not find gem 'CFPropertyList (>= 0) ruby' in git@github.com:ckruse/CFPropertyList.git (at master).
Source does not contain any versions of 'CFPropertyList (>= 0) ruby'
paulyoung commented 10 years ago

This could be because there's no .gemspec file.

paulyoung commented 10 years ago

I made the same change to my local copy of the gem and the binary output looks correct now!

paulyoung commented 10 years ago

Do you understand why it needs to be a dictionary?

ckruse commented 10 years ago

It isn't a dictionary in binary plists; in bplists it is basically an integer with a special type byte. I think they're trying to get rid of it so they didn't invent a <uid> tag in XML (neither a <set> tag).

That's also why I don't think using a dict for this in Ruby is a good idea as well. We should replace the dict by a number, IMHO. The argument „when I see a CF$UID key I know that it is meant as an index” seems to be problematic to me: normally one knows the structure one saves in a plist.

paulyoung commented 10 years ago

normally one knows the structure one saved in a plist.

These binary plists are generated by NSKeyedArchiver, so the structure is not known.

we should replace the dict by a number, IMHO

What would the type of that number be in Ruby? If it's an integer I'd have no way to determine if it's a reference or just a number.

As the article I linked to says:

in most real-life cases the complex data held in these files contains many repeating values which, when arranged this way, only have to be stored once but can be referenced in the “$objects” array multiple times.

ckruse commented 10 years ago

Well, I guess we could do something like we did for blobs. So it would be a class derived from Fixnum or something like that.

paulyoung commented 10 years ago

Here's how I'm using it: https://github.com/paulyoung/keyed_archive

ckruse commented 10 years ago

I see. I don't think there's anything speaking against CFPropertyList.native_types returning a UidFixnum (which is just a class UidFixnum < Fixnum), do you? It then is still distinguishable from „normal“ numbers

paulyoung commented 10 years ago

That sounds fine to me.

ckruse commented 10 years ago

Good. Will implement that this evening. :)

ckruse commented 10 years ago

Just for interest: I didn't forget it. I'm still struggling with generating a valid binary UID entry…

ckruse commented 10 years ago

Ok, should be fixed now. Had overseen a really silly bug in my binary generation code for two days… ;) Can you confirm that it works for you?