Open AllSeeingEyeTolledEweSew opened 3 years ago
there should be unit tests for these in test.py
. when you say "unexpected output" what do you mean exactly?
Is the string not treated as unicode?
in .torrent files, strings are supposed to be encoded as utf-8 if they are meant to be human readable, and raw-bytes otherwise.
So, piece hashes are binary, name, filenames, creator and comment are utf-8.
It's not obvious to me that the character strings functions should accept bytes as well, but perhaps there's something I'm not considering.
there should be unit tests for these in
test.py
. when you say "unexpected output" what do you mean exactly?
In master on Linux:
>>> import libtorrent as lt
>>> fs = lt.file_storage()
>>> fs.add_file("test.txt", 1024)
>>> ct = lt.create_torrent(fs)
>>> ct.set_hash(0, b"a" * 1024)
>>> ct.add_url_seed("http://example.com")
>>> ct.generate()
{b'creation date': 1617038247, b'info': {b'length': 1024, b'name': b'test.txt', b'piece length': 16384, b'pieces': b'aaaaaaaaaaaaaaaaaaaa'}, b'url-list': b'h\x00\x00\x00t\x00\x00\x00t\x00\x00\x00p\x00\x00\x00:\x00\x00\x00/\x00\x00\x00/\x00\x00\x00e\x00\x00\x00x\x00\x00\x00a\x00\x00\x00m\x00\x00\x00p\x00\x00\x00l\x00\x00\x00e\x00\x00\x00.\x00\x00\x00c\x00\x00\x00o\x00\x00\x00m\x00\x00\x00'}
in .torrent files, strings are supposed to be encoded as utf-8 if they are meant to be human readable, and raw-bytes otherwise.
So, piece hashes are binary, name, filenames, creator and comment are utf-8.
It's not obvious to me that the character strings functions should accept bytes as well, but perhaps there's something I'm not considering.
utf-8 is ideal, but not enforced. In the wild I find a lot of cp1252 (or subsets or lookalikes, it's impossible to know), including a significant portion of "blessed" torrents on private trackers.
Even a careful, BEP-aware torrent creator may want to use strings of non-utf-8 or unknown encoding:
collections
I think strings with unknown encoding are here to stay, because
collections
or creator
may be binary hash values. In the real world, filenames are read by software as well as humans, and "human-readabe" is not the lowest common denominator between the two.name.utf-8
and path.utf-8
exist, but appear to be nonstandard, and I can't see that the pattern has been used for other strings like comment
. When the knowledge of a string's encoding is lost, a new string of unknown encoding is created.Zooming out, I've tried to research prior art to follow patterns for how the python bindings should deal with string encodings here, but I think correct solutions are always going to be awkward because the BEPs have impedence mismatch with python.
Since 3.0 python brightened the line between str
and bytes
, i.e. character strings of explicit encoding vs opaque binary data. The BEPs' approach to encodings seems to make more sense in the C/C++ world, where string decoding is more rare, more ad-hoc, and is often done just for user presentation, so the user can correct errors visually or specify a different encoding.
Unfortunately I think the BEPs are so loosely written that it's pretty inevitable that implementations will continue to create a lot of weird data, whether for good reasons or ambiguity. So a feature-complete bittorrent implementation has a lot of work to do to support the lowest common denominator.
>>> import libtorrent as lt
>>> fs = lt.file_storage()
>>> fs.add_file("test.txt", 1024)
>>> ct = lt.create_torrent(fs)
>>> ct.set_hash(0, b"a" * 1024)
>>> ct.add_url_seed("http://example.com")
>>> ct.generate()
The unexpected part is that the strings are bytes? I think that makes sense, especially given all the arguments you made in favor of being encoding agnostic. But also, at the bencoding layer, you don't know whether a string of bytes should be interpreted as bytes are a utf-8 string.
It's this part: b'url-list': b'h\x00\x00\x00t\x00\x00\x00t\x00\x00\x00p...
. It's like each character got cast to an int32
or something.
ah, yes. That doesn't look right
I believe this would fix most some of these issues. All the ones where a function takes a string_view
at least.
torrent_info.file_at_offset()
is deprecated and returns an internal (also deprecated) type. I don't think it would make sense to make this work. Since it doesn't work now, likely, barely anybody is using it. I'd like to keep it that way so I think just removing it from the binding is the best option.
same thing goes for add_http_seed()
actually. It's also deprecated in recent versions of libtorrent, and if it isn't working now, I don't think it should be fixed.
torrent_info.file_at_offset()
is deprecated and returns an internal (also deprecated) type. I don't think it would make sense to make this work. Since it doesn't work now, likely, barely anybody is using it. I'd like to keep it that way so I think just removing it from the binding is the best option.same thing goes for
add_http_seed()
actually. It's also deprecated in recent versions of libtorrent, and if it isn't working now, I don't think it should be fixed.
Seems good, especially if these were also deprecated in RC_1_2
https://github.com/arvidn/libtorrent/pull/6129 fixes the ip_filter issue
fingerprint.name
is deprecated in 1.2.x, so rather than fixing it I will remove it
I'm not convinced about set_creator()
and set_comment()
taking bytes. that seems wrong to me.
I suggested it to be consistent with add_collection()
which takes bytes
.
I think it's also reasonable to deprecate add_collection()
taking bytes
for consistency, if the intent of create_torrent
is to enforce standards like utf-8 everywhere.
Or we could do neither, but even in this case I thought it's a good idea to have an intentional plan, and some unit tests
@arvidn I realized that #6126 may break more than you intend. torrent_info(bytes)
had previously treated the input as a filename, but now it treats it as bencoded data.
I actually think it's a really good feature (it's good for bdecoding to happen in C++, rather than python).
But this may break some valid usage today. Python has good support of using bytes
pathnames. Pathnames always can be converted to str with os.fsdecode
but callers would need to do this explicitly. (See #5984 for more info)
Options I can think of:
torrent_info(bytes)
as a constructor from pathname, and add a factory function. A common pattern in python is torrent_info.from_bencoded(bytes)
, not sure if boost.python makes that easy or not. Otherwise a top-level function would be fine. It would also be nice if torrent_info.from_bencoded(memoryview)
worked: memoryview
is like void*
in python, and allows zero-copy data passing. On the cpython side, there's a "buffer protocol" for zero-copy access of a buffer-like python object. It would be nice if torrent_info.from_bencoded(...)
took any argument that supported the buffer protocoltorrent_info(bytearray)
and torrent_info(memoryview)
as constructors from bencoded data, and torrent_info(bytes)
as a constructor from pathname.Also, #6126 (or something else) broke create_torrent.set_root_cert(bytes)
. Was this intended? I haven't figured out what the right test case is meant to be
create_torrent.add_collection(bytes)
is also broken. Maybe it should be removed per https://github.com/arvidn/libtorrent/issues/5984#issuecomment-820767114 , but let's deprecate it first
I noticed in #6129, export_filter()
doesn't emit the flags. Is this intended?
there is no session_handle::find_torrent()
that takes an info_hash_t
. Maybe there should be one, but I don't think it belongs on this list
This is an attempt to fix the DHT alert fields. It's not so easy to write a test for these, since you can't construct the alerts. Could you test it?
I don't know what's going on with dht_mutable_item_alert.item
there is no
session_handle::find_torrent()
that takes aninfo_hash_t
. Maybe there should be one, but I don't think it belongs on this list
ahh, I'd mistaken session_interface
for session_handle
. Removed from the list
This is an attempt to fix the DHT alert fields. It's not so easy to write a test for these, since you can't construct the alerts. Could you test it?
6166
Responded in the PR; some stuff is still broken. Updated the list.
FYI I'm done writing new unit tests for now. I'll wrap them up into a PR, or several. I won't add any more items to this issue.
deluge files tab is affected by torrent_handle.file_progress
Traceback (most recent call last):
File "/usr/lib/python3.9/site-packages/deluge/ui/gtk3/files_tab.py", line 296, in update
component.get('SessionProxy').get_torrent_status(
File "/usr/lib/python3.9/site-packages/deluge/ui/sessionproxy.py", line 147, in get_torrent_status
d = client.core.get_torrent_status(torrent_id, keys_to_get, True)
File "/usr/lib/python3.9/site-packages/deluge/ui/client.py", line 551, in __call__
return self.daemon.call(self.base, *args, **kwargs)
File "/usr/lib/python3.9/site-packages/deluge/ui/client.py", line 500, in call
return defer.maybeDeferred(m, *copy.deepcopy(args), **copy.deepcopy(kwargs))
--- <exception caught here> ---
File "/usr/lib/python3.9/site-packages/twisted/internet/defer.py", line 167, in maybeDeferred
result = f(*args, **kw)
File "/usr/lib/python3.9/site-packages/deluge/core/core.py", line 762, in get_torrent_status
return self.create_torrent_status(
File "/usr/lib/python3.9/site-packages/deluge/core/core.py", line 742, in create_torrent_status
status = self.torrentmanager[torrent_id].get_status(
File "/usr/lib/python3.9/site-packages/deluge/core/torrent.py", line 1003, in get_status
status_dict[key] = self.status_funcs[key]()
File "/usr/lib/python3.9/site-packages/deluge/core/torrent.py", line 873, in get_file_progress
self.handle.file_progress(), self.torrent_info.files()
Boost.Python.ArgumentError: Python argument types in
torrent_handle.file_progress(torrent_handle)
did not match C++ signature:
file_progress(libtorrent::torrent_handle {lvalue}, libtorrent::flags::bitfield_flag<unsigned char, libtorrent::file_progress_flags_tag, void> flags=0)
13:39:55 [CRITICAL][deluge.log :90 ] twisted.internet.defer
[Failure instance: Traceback: <class 'Boost.Python.ArgumentError'>: Python argument types in
torrent_handle.file_progress(torrent_handle)
did not match C++ signature:
file_progress(libtorrent::torrent_handle {lvalue}, libtorrent::flags::bitfield_flag<unsigned char, libtorrent::file_progress_flags_tag, void> flags=0)
/usr/lib/python3.9/site-packages/deluge/ui/gtk3/files_tab.py:296:update
/usr/lib/python3.9/site-packages/deluge/ui/sessionproxy.py:147:get_torrent_status
/usr/lib/python3.9/site-packages/deluge/ui/client.py:551:__call__
/usr/lib/python3.9/site-packages/deluge/ui/client.py:500:call
It looks like we removed support for torrent_info()
constructors from bytearray
and memoryview
.
That's fine, I expect no one ever used them. They were only on the list because it appeared this might have been supported at one time.
I removed them from the list
@arvidn Is this in the wrong milestone?
Deluge really needs the file_progress_flags fix #6369 for lt 2.0
There are a few functions in the python bindings that seem like they're intended to work given the source code, but invoking the bindings doesn't work or doesn't produce the expected result.
create_torrent.add_url_seed("str arg")
: produces unexpected results in output ofcreate_torrent.generate()
. maybe the input got mapped incorrectly?create_torrent.add_http_seed("str arg")
: produces unexpected results in output ofcreate_torrent.generate()
. maybe the input got mapped incorrectly?create_torrent.add_collection("str arg")
: produces unexpected results in output ofcreate_torrent.generate()
. maybe the input got mapped incorrectly?create_torrent.set_root_cert("str arg")
: produces unexpected results in output ofcreate_torrent.generate()
. maybe the input got mapped incorrectly? I'm not sure if this function should be deprecated, or if it should be allowed, so as to accept PEM data as astr
.fingerprint.name
: accessing the attribute raisesTypeError
ip_filter.export_filter()
: raisesTypeError
torrent_info(bytearray(bencode({b"info": {...}})))
: raisesRuntimeError("torrent file is not a dictionary")
, though the source code implies this should worktorrent_info(memoryview(bencode({b"info": {...}})))
: raisesRuntimeError("torrent file is not a dictionary")
, though the source code implies this should worktorrent_info(info_hash_t(sha1_hash(b"a" * 20)))
: raisesRuntimeError("torrent file is not a dictionary")
, though this should probably worktorrent_info.add_tracker("str arg")
does not match a type signature, though source code indicates it should allow the other arguments to be defaultstorrent_info.add_url_seed("str arg")
does not match a type signature, though source code indicates it should allow the other arguments to be defaultstorrent_info.add_url_seed("str arg", "str arg", [])
does not match a type signature, though source code indicates it should worktorrent_info.add_http_seed("str arg")
does not match a type signature, though source code indicates it should allow the other arguments to be defaultstorrent_info.add_http_seed("str arg", "str arg", [])
does not match a type signature, though source code indicates it should worktorrent_info.file_at_offset(0)
raisesTypeError
session_status.active_requests
raisesTypeError
add_torrent_params.unfinished_pieces = {}
does not match a type signatureadd_torrent_params.renamed_files = {}
does not match a type signatureadd_torrent_params.merkle_tree = []
does not match a type signaturesession.dht_announce(sha1_hash(b"a"*20), 0, 0)
does not match a type signatureannounce_flags
argument (fordht_announce()
) doesn't seem to be mapped anywhere, unless I missed itsession.find_torrent(info_hash_t(...))
isn't bound, but seems to exist in C++ codesession.add_port_mapping(...)
raisesTypeError
apparently due to being unable to type-map the return valuetorrent_handle.file_progress()
does not match a type signaturepeer_disconnected_alert.reason
does not match a type signaturepicker_log_alert.picker_flags
does not match a type signaturepicker_log_alert.blocks()
does not match a type signaturedht_log_alert.module
does not match a type signaturedht_immutable_item_alert.item
raises "invalid type requested from entry"dht_mutable_item_alert.key
raises "no Python class registered"dht_mutable_item_alert.item
raises "invalid type requested from entry"dht_mutable_item_alert.signature
raises "no Python class registered"dht_put_item_alert.public_key
raises "no Python class registered"dht_put_item_alert.signature
raises "no Python class registered"I'm finished writing new unit tests and won't add any more items to this issue.