ebarnard / rust-plist

A rusty plist parser.
MIT License
71 stars 42 forks source link

Continuation of the ASCII reader PR #136

Closed steven-joruk closed 3 months ago

steven-joruk commented 7 months ago

This continues from #44

Some comments brought over from there:

  1. ~The serde tests highlighted that documents that don't begin with exactly <?xml, with no preceding whitespace, will be treated as ascii, which might not be desirable.~
  2. ~The master branch is broken due to denying warnings and a deprecation, I switched to using swap_remove.~
  3. I had to allow escaping \ (\\) because the test that parses netnewswire.pbxproj fails without it.

The fuzzer quickly found an infinite loop in handling block comments. I let it run for another 10 hours, it tried 510 million inputs without finding anything else.

The related issue (#42) contains a suggestion that it should be renamed to OpenStepReader or similar. I don't know the full history of the format (wikipedia discusses it here). If I'm understanding it correctly then NextStep read integers as strings, OpenStep supported integers and real numbers, GNUStep supported NSValue and NSDate. This is missing support for floats.

ebarnard commented 7 months ago

Item 1 Is the biggest issue - a well-formed UTF8 XML document can start with a BOM which we must support, and ideally we would also support XML plists that have whitespace before the leading < character.

Can the first character of a reasonable ASCII plist file be a <?

steven-joruk commented 7 months ago

Item 1 Is the biggest issue - a well-formed UTF8 XML document can start with a BOM which we must support, and ideally we would also support XML plists that have whitespace before the leading < character.

I agree, I've pushed a fix. If there's any unicode byte order mark or if the first non-whitespace string is "<?xml" then it's considered XML.