Closed paulyoung closed 10 years ago
Here's a simpler example.
Plist:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>foo</key>
<dict>
<key>bar</key>
<dict>
<key>baz</key>
<integer>42</integer>
</dict>
</dict>
</dict>
</plist>
Ruby file:
require 'cfpropertylist'
filepath = ARGV[0]
plist = CFPropertyList::List.new(:file => filepath)
data = CFPropertyList.native_types(plist.value)
puts data['foo']
Output:
{"bar"=>nil}
FWIW, I've tried several libraries of this type now and all have the same issue.
I can't reproduce this problem. With this corrected XML file:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>foo</key>
<array>
<dict>
<key>bar</key>
<dict>
<key>baz</key>
<integer>42</integer>
</dict>
</dict>
</array>
</dict>
</plist>
and this Ruby code:
require 'cfpropertylist'
filepath = 'test.xml'
plist = CFPropertyList::List.new(:file => filepath)
puts plist.value.inspect
data = CFPropertyList.native_types(plist.value)
puts data.inspect
I get this output:
#<CFPropertyList::CFDictionary:0x00000001129908 @value={"foo"=>#<CFPropertyList::CFArray:0x00000001129958 @value=[#<CFPropertyList::CFDictionary:0x00000001129980 @value={"bar"=>#<CFPropertyList::CFDictionary:0x000000011299d0 @value={"baz"=>#<CFPropertyList::CFInteger:0x00000001129a20 @value=42>}>}>]>}>
{"foo"=>[{"bar"=>{"baz"=>42}}]}
This is exactly what I was expecting: a hash with a key 'foo'
containing an array with just another hash as the only value. The hash conains a key 'bar'
which contains another hash. That last hash has a key 'baz'
containing 42. Exactly what the XML was describing.
Can you paste the output of this Ruby code?
puts CFPropertyList.xml_parser_interface.inspect
CFPropertyList can use three different XML parsers: lubxml-ruby, nokogiri as well as REXML. It will try to load libxml since it is the fastest parser, then fall back to Nokogiri and last try out REXML (since it is the slowest). Maybe one of the parser backends has a bug?
I'm still experiencing the issue. Here is the exact file I'm using: https://www.dropbox.com/s/v6r9n64yi9pl6h0/Data (binary plist)
Output of puts plist.value.inspect
: https://www.dropbox.com/s/gthw3484pu43o10/plist.value.inspect.rb
Output of puts data.inspect
: https://www.dropbox.com/s/sdblzv9xfdmdpqn/data.inspect.rb
FYI - adding the .plist
extension to the file allows you to view it as XML using QuickLook on OS X. I can provide the XML if needed.
What value do you exactly expect? There is no foo
value in this plist. Can you elaborate?
XML is not needed, I can see the structure via Xcode :)
The foo
, bar
, baz
code was a contrived example.
The beginning of the plist looks like this:
plist (truncated):
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>$archiver</key>
<string>MSArchiver</string>
<key>$objects</key>
<array>
<string>$null</string>
<dict>
<key>$class</key>
<dict>
<key>CF$UID</key>
<integer>4129</integer>
</dict>
However as you can see from the output, the value of the "$class"
keys (and any other dictionaries) is always nil
:
output (truncated):
{
"$version"=>100000,
"$objects"=>[
"$null",
{
"currentPageIndex"=>0,
"pages"=>nil,
"do_objectID"=>nil,
"layerStyles"=>nil,
"images"=>nil,
"$class"=>nil,
"layerTextStyles"=>nil,
"layerSymbols"=>nil
},
{
"images"=>nil,
"$class"=>nil
}, {
"$class"=>nil
}
Interesting. In XML plists, this seems to be correct:
["$null",
{"$class"=>{"CF$UID"=>4129},
"currentPageIndex"=>0,
"do_objectID"=>{"CF$UID"=>4128},
"images"=>{"CF$UID"=>2},
"layerStyles"=>{"CF$UID"=>6},
"layerSymbols"=>{"CF$UID"=>12},
"layerTextStyles"=>{"CF$UID"=>16},
"pages"=>{"CF$UID"=>20}},
{"$class"=>{"CF$UID"=>5}, "images"=>{"CF$UID"=>3}},
while in binary plists this seems to be a problem:
["$null",
{"currentPageIndex"=>0,
"pages"=>nil,
"do_objectID"=>nil,
"layerStyles"=>nil,
"images"=>nil,
"$class"=>nil,
"layerTextStyles"=>nil,
"layerSymbols"=>nil},
{"images"=>nil, "$class"=>nil},
{"$class"=>nil},
This wouldn't be so interesting if the apple parser wouldn't seem to have this bug as well. Opening the XML plist in Xcode and everything is fine; opening the binary plist in Xcode and there are lots of entries which seem to be empty.
On the other hand: plutil -convert xml1
produces the correct value.
Still looking for more…
Thanks. Sounds like I can workaround the issue by converting the binary plist to XML.
Seems like it, yes.
I can limit the problem to a very basic test case:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>CF$UID</key>
<integer>4129</integer>
</dict>
</plist>
Change this to a binary plist with plutil -convert binary1
and have a look at it via Xcode or my library. If I change the CF$UID
to something different (e.g. DF$UID
) it works. This is confusing…
Maybe CF$UID
is treated differently because it's a reference to another entity in the $objects
array.
No. It is special because of the name of the key.
When I reverse engineered the plist format the C implementation of Apple had some references to object types I could neither find a meaning nor an existing value: set and uid. When I change the key name to something different, the object type is integer. When I change the object type to CF$UID
the object type is uid.
It seems that Apple implemented some weird special handling for some keys.
In case this helps in any way: http://www.cclgroupltd.com/geek-post-nskeyedarchiver-files-what-are-they-and-how-can-i-use-them/
Nice, this helps. Thanks.
I pushed a new commit. Can you test and tell me if this solves your problem?
Currently I create a dictionary with just one key in binary as well, but I'm absolutely not sure if we should do it this way; IMHO both binary and XML should replace the dict by the pure UID number.
What do you think?
I'll try it and let you know.
I think it's important to match the structure of the XML. When there's a dictionary with the key "CF$UID"
I know that it's value is the index of another element in the array.
If it was just an integer I wouldn't be able to get the element it's referring to.
I want to try what's on master so I updated my Gemfile to this:
source 'https://rubygems.org'
# gem 'CFPropertyList', '~> 2.2.7'
gem 'CFPropertyList', :git => 'git@github.com:ckruse/CFPropertyList.git'
and got this output from bundle install
:
Updating git@github.com:ckruse/CFPropertyList.git
Fetching gem metadata from https://rubygems.org/...
Resolving dependencies...
Could not find gem 'CFPropertyList (>= 0) ruby' in git@github.com:ckruse/CFPropertyList.git (at master).
Source does not contain any versions of 'CFPropertyList (>= 0) ruby'
This could be because there's no .gemspec
file.
I made the same change to my local copy of the gem and the binary output looks correct now!
Do you understand why it needs to be a dictionary?
It isn't a dictionary in binary plists; in bplists it is basically an integer with a special type byte. I think they're trying to get rid of it so they didn't invent a <uid>
tag in XML (neither a <set>
tag).
That's also why I don't think using a dict for this in Ruby is a good idea as well. We should replace the dict by a number, IMHO. The argument „when I see a CF$UID
key I know that it is meant as an index” seems to be problematic to me: normally one knows the structure one saves in a plist.
normally one knows the structure one saved in a plist.
These binary plists are generated by NSKeyedArchiver, so the structure is not known.
we should replace the dict by a number, IMHO
What would the type of that number be in Ruby? If it's an integer I'd have no way to determine if it's a reference or just a number.
As the article I linked to says:
in most real-life cases the complex data held in these files contains many repeating values which, when arranged this way, only have to be stored once but can be referenced in the “$objects” array multiple times.
Well, I guess we could do something like we did for blobs. So it would be a class derived from Fixnum
or something like that.
Here's how I'm using it: https://github.com/paulyoung/keyed_archive
I see. I don't think there's anything speaking against CFPropertyList.native_types
returning a UidFixnum
(which is just a class UidFixnum < Fixnum
), do you? It then is still distinguishable from „normal“ numbers
That sounds fine to me.
Good. Will implement that this evening. :)
Just for interest: I didn't forget it. I'm still struggling with generating a valid binary UID entry…
Ok, should be fixed now. Had overseen a really silly bug in my binary generation code for two days… ;) Can you confirm that it works for you?
Given the following plst:
and this ruby file:
I get the following output:
when running: