Non utf-8 encoding torrent

Krusen / BencodeNET

.NET library for encoding/decoding bencode and reading/writing torrent files

The Unlicense

152 stars 28 forks source link

Non utf-8 encoding torrent #19

Closed tautcony closed 8 years ago

tautcony commented 8 years ago

Although BencodeNET actually supports multiple encodings, when loading a non utf-8 encoded torrent, the process while parsing such as ToString in the ParseSingleFileInfo function, Does not use the encoding provided in the torrent but using the default utf-8 encoding to convert the BString to string. After that, there is a process to set the encoding to each field, but the string has been lost some data irreversible.

Another point, all fields in ExtraFields are not properly marked as given encoded, so that they are marked as utf-8.

The sample has uploaded. GBK.torrent.zip

Krusen commented 8 years ago

Currently it uses the encoding from the BencodeParser.

I agree, though, that it would probably make sense to try and use the encoding specified in the torrent file itself. I'll look into it.

tautcony commented 8 years ago

Well, thought I can pass the encoding to the BencodeParser, but in some old torrent(eg: which I upload) may has different encoding in different fields, such as name with GBK and name.utf-8 with utf-8(maybe keep the original data as BString would be better?)

Krusen commented 8 years ago

Please try out v2.2.1 from NuGet and see if you still have any issues.

It will now try to use the encoding from the torrent file itself, except if the dictionary key ends with utf-8, then it will use UTF-8.

If the torrent does not specify an encoding it will fall back to the parser's encoding.