folz / bento

:bento: A fast, correct, pure-Elixir library for reading and writing Bencoded metainfo (.torrent) files.
Mozilla Public License 2.0
95 stars 14 forks source link

Torrents with additonal "<blah>.utf-8" fields are unable to be parsed. #14

Open wkpatrick opened 3 years ago

wkpatrick commented 3 years ago

This is due to :erlang.binary_to_existing_atom not being able to turn "name.utf-8" into an atom (the period is what causes it to trip up).

Stacktrace

(exit) an exception was raised: (ArgumentError) errors were found at the given arguments:

"name.utf-8" appears to be from clients such as Vuze/Azeurus for handling old torrents that did not write the name/path's in utf-8 (see here).

I am not sure how to "properly" handle this, but for now I will just have it filter out any keys with .utf-8 in the data metainfo before parsing. Id be happy to merge those changes in, or any different change you would suggest.

Thanks for the useful library!

folz commented 3 years ago

Oh interesting, thanks for the report. I hadn't come across the nonstandard .utf-8 naming before. This seems reasonable to support in bento.

From the issue you linked, it's true that nonstandard key names aren't disallowed by the spec (they aren't mentioned). I guess that implies we should not be converting to atoms, because keys could be named anything - so we should represent key names as strings internally.

I am not presently doing anything with this library, so I would appreciate a PR if you want to put one up. I think we should make sure that .torrent() continues to validate that the metainfo has a torrent-compliant shape, but now will allow other keys. Please make sure it also includes (a) test case(s) covering this issue, and the results of benchmarking before and after the change.