arvidn / libtorrent

an efficient feature complete C++ bittorrent implementation
http://libtorrent.org
Other
5.17k stars 995 forks source link

Python bindings: various apparent gaps in type signature mappings #5995

Open AllSeeingEyeTolledEweSew opened 3 years ago

AllSeeingEyeTolledEweSew commented 3 years ago

There are a few functions in the python bindings that seem like they're intended to work given the source code, but invoking the bindings doesn't work or doesn't produce the expected result.

I'm finished writing new unit tests and won't add any more items to this issue.

arvidn commented 3 years ago

there should be unit tests for these in test.py. when you say "unexpected output" what do you mean exactly? Is the string not treated as unicode? in .torrent files, strings are supposed to be encoded as utf-8 if they are meant to be human readable, and raw-bytes otherwise.

So, piece hashes are binary, name, filenames, creator and comment are utf-8.

It's not obvious to me that the character strings functions should accept bytes as well, but perhaps there's something I'm not considering.

AllSeeingEyeTolledEweSew commented 3 years ago

there should be unit tests for these in test.py. when you say "unexpected output" what do you mean exactly?

In master on Linux:

>>> import libtorrent as lt
>>> fs = lt.file_storage()
>>> fs.add_file("test.txt", 1024)
>>> ct = lt.create_torrent(fs)
>>> ct.set_hash(0, b"a" * 1024)
>>> ct.add_url_seed("http://example.com")
>>> ct.generate()
{b'creation date': 1617038247, b'info': {b'length': 1024, b'name': b'test.txt', b'piece length': 16384, b'pieces': b'aaaaaaaaaaaaaaaaaaaa'}, b'url-list': b'h\x00\x00\x00t\x00\x00\x00t\x00\x00\x00p\x00\x00\x00:\x00\x00\x00/\x00\x00\x00/\x00\x00\x00e\x00\x00\x00x\x00\x00\x00a\x00\x00\x00m\x00\x00\x00p\x00\x00\x00l\x00\x00\x00e\x00\x00\x00.\x00\x00\x00c\x00\x00\x00o\x00\x00\x00m\x00\x00\x00'}

in .torrent files, strings are supposed to be encoded as utf-8 if they are meant to be human readable, and raw-bytes otherwise.

So, piece hashes are binary, name, filenames, creator and comment are utf-8.

It's not obvious to me that the character strings functions should accept bytes as well, but perhaps there's something I'm not considering.

utf-8 is ideal, but not enforced. In the wild I find a lot of cp1252 (or subsets or lookalikes, it's impossible to know), including a significant portion of "blessed" torrents on private trackers.

Even a careful, BEP-aware torrent creator may want to use strings of non-utf-8 or unknown encoding:

I think strings with unknown encoding are here to stay, because


Zooming out, I've tried to research prior art to follow patterns for how the python bindings should deal with string encodings here, but I think correct solutions are always going to be awkward because the BEPs have impedence mismatch with python.

Since 3.0 python brightened the line between str and bytes, i.e. character strings of explicit encoding vs opaque binary data. The BEPs' approach to encodings seems to make more sense in the C/C++ world, where string decoding is more rare, more ad-hoc, and is often done just for user presentation, so the user can correct errors visually or specify a different encoding.

Unfortunately I think the BEPs are so loosely written that it's pretty inevitable that implementations will continue to create a lot of weird data, whether for good reasons or ambiguity. So a feature-complete bittorrent implementation has a lot of work to do to support the lowest common denominator.

arvidn commented 3 years ago
>>> import libtorrent as lt
>>> fs = lt.file_storage()
>>> fs.add_file("test.txt", 1024)
>>> ct = lt.create_torrent(fs)
>>> ct.set_hash(0, b"a" * 1024)
>>> ct.add_url_seed("http://example.com")
>>> ct.generate()

The unexpected part is that the strings are bytes? I think that makes sense, especially given all the arguments you made in favor of being encoding agnostic. But also, at the bencoding layer, you don't know whether a string of bytes should be interpreted as bytes are a utf-8 string.

AllSeeingEyeTolledEweSew commented 3 years ago

It's this part: b'url-list': b'h\x00\x00\x00t\x00\x00\x00t\x00\x00\x00p.... It's like each character got cast to an int32 or something.

arvidn commented 3 years ago

ah, yes. That doesn't look right

arvidn commented 3 years ago

I believe this would fix most some of these issues. All the ones where a function takes a string_view at least.

https://github.com/arvidn/libtorrent/pull/6126

arvidn commented 3 years ago

torrent_info.file_at_offset() is deprecated and returns an internal (also deprecated) type. I don't think it would make sense to make this work. Since it doesn't work now, likely, barely anybody is using it. I'd like to keep it that way so I think just removing it from the binding is the best option.

arvidn commented 3 years ago

same thing goes for add_http_seed() actually. It's also deprecated in recent versions of libtorrent, and if it isn't working now, I don't think it should be fixed.

AllSeeingEyeTolledEweSew commented 3 years ago

torrent_info.file_at_offset() is deprecated and returns an internal (also deprecated) type. I don't think it would make sense to make this work. Since it doesn't work now, likely, barely anybody is using it. I'd like to keep it that way so I think just removing it from the binding is the best option.

same thing goes for add_http_seed() actually. It's also deprecated in recent versions of libtorrent, and if it isn't working now, I don't think it should be fixed.

Seems good, especially if these were also deprecated in RC_1_2

arvidn commented 3 years ago

https://github.com/arvidn/libtorrent/pull/6129 fixes the ip_filter issue

arvidn commented 3 years ago

fingerprint.name is deprecated in 1.2.x, so rather than fixing it I will remove it

arvidn commented 3 years ago

https://github.com/arvidn/libtorrent/pull/6130

arvidn commented 3 years ago

I'm not convinced about set_creator() and set_comment() taking bytes. that seems wrong to me.

AllSeeingEyeTolledEweSew commented 3 years ago

I suggested it to be consistent with add_collection() which takes bytes.

I think it's also reasonable to deprecate add_collection() taking bytes for consistency, if the intent of create_torrent is to enforce standards like utf-8 everywhere.

Or we could do neither, but even in this case I thought it's a good idea to have an intentional plan, and some unit tests

arvidn commented 3 years ago

https://github.com/arvidn/libtorrent/pull/6149

arvidn commented 3 years ago

https://github.com/arvidn/libtorrent/pull/6150

AllSeeingEyeTolledEweSew commented 3 years ago

@arvidn I realized that #6126 may break more than you intend. torrent_info(bytes) had previously treated the input as a filename, but now it treats it as bencoded data.

I actually think it's a really good feature (it's good for bdecoding to happen in C++, rather than python).

But this may break some valid usage today. Python has good support of using bytes pathnames. Pathnames always can be converted to str with os.fsdecode but callers would need to do this explicitly. (See #5984 for more info)

Options I can think of:

  1. My preference: keep torrent_info(bytes) as a constructor from pathname, and add a factory function. A common pattern in python is torrent_info.from_bencoded(bytes), not sure if boost.python makes that easy or not. Otherwise a top-level function would be fine. It would also be nice if torrent_info.from_bencoded(memoryview) worked: memoryview is like void* in python, and allows zero-copy data passing. On the cpython side, there's a "buffer protocol" for zero-copy access of a buffer-like python object. It would be nice if torrent_info.from_bencoded(...) took any argument that supported the buffer protocol
  2. Use torrent_info(bytearray) and torrent_info(memoryview) as constructors from bencoded data, and torrent_info(bytes) as a constructor from pathname.
AllSeeingEyeTolledEweSew commented 3 years ago

Also, #6126 (or something else) broke create_torrent.set_root_cert(bytes). Was this intended? I haven't figured out what the right test case is meant to be

AllSeeingEyeTolledEweSew commented 3 years ago

create_torrent.add_collection(bytes) is also broken. Maybe it should be removed per https://github.com/arvidn/libtorrent/issues/5984#issuecomment-820767114 , but let's deprecate it first

AllSeeingEyeTolledEweSew commented 3 years ago

I noticed in #6129, export_filter() doesn't emit the flags. Is this intended?

arvidn commented 3 years ago

there is no session_handle::find_torrent() that takes an info_hash_t. Maybe there should be one, but I don't think it belongs on this list

arvidn commented 3 years ago

This is an attempt to fix the DHT alert fields. It's not so easy to write a test for these, since you can't construct the alerts. Could you test it?

https://github.com/arvidn/libtorrent/pull/6166

arvidn commented 3 years ago

I don't know what's going on with dht_mutable_item_alert.item

AllSeeingEyeTolledEweSew commented 3 years ago

there is no session_handle::find_torrent() that takes an info_hash_t. Maybe there should be one, but I don't think it belongs on this list

ahh, I'd mistaken session_interface for session_handle. Removed from the list

This is an attempt to fix the DHT alert fields. It's not so easy to write a test for these, since you can't construct the alerts. Could you test it?

6166

Responded in the PR; some stuff is still broken. Updated the list.

AllSeeingEyeTolledEweSew commented 3 years ago

FYI I'm done writing new unit tests for now. I'll wrap them up into a PR, or several. I won't add any more items to this issue.

leigh123linux commented 3 years ago

deluge files tab is affected by torrent_handle.file_progress

Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/deluge/ui/gtk3/files_tab.py", line 296, in update
    component.get('SessionProxy').get_torrent_status(
  File "/usr/lib/python3.9/site-packages/deluge/ui/sessionproxy.py", line 147, in get_torrent_status
    d = client.core.get_torrent_status(torrent_id, keys_to_get, True)
  File "/usr/lib/python3.9/site-packages/deluge/ui/client.py", line 551, in __call__
    return self.daemon.call(self.base, *args, **kwargs)
  File "/usr/lib/python3.9/site-packages/deluge/ui/client.py", line 500, in call
    return defer.maybeDeferred(m, *copy.deepcopy(args), **copy.deepcopy(kwargs))
--- <exception caught here> ---
  File "/usr/lib/python3.9/site-packages/twisted/internet/defer.py", line 167, in maybeDeferred
    result = f(*args, **kw)
  File "/usr/lib/python3.9/site-packages/deluge/core/core.py", line 762, in get_torrent_status
    return self.create_torrent_status(
  File "/usr/lib/python3.9/site-packages/deluge/core/core.py", line 742, in create_torrent_status
    status = self.torrentmanager[torrent_id].get_status(
  File "/usr/lib/python3.9/site-packages/deluge/core/torrent.py", line 1003, in get_status
    status_dict[key] = self.status_funcs[key]()
  File "/usr/lib/python3.9/site-packages/deluge/core/torrent.py", line 873, in get_file_progress
    self.handle.file_progress(), self.torrent_info.files()
Boost.Python.ArgumentError: Python argument types in
    torrent_handle.file_progress(torrent_handle)
did not match C++ signature:
    file_progress(libtorrent::torrent_handle {lvalue}, libtorrent::flags::bitfield_flag<unsigned char, libtorrent::file_progress_flags_tag, void> flags=0)

13:39:55 [CRITICAL][deluge.log                        :90  ] twisted.internet.defer 
[Failure instance: Traceback: <class 'Boost.Python.ArgumentError'>: Python argument types in
    torrent_handle.file_progress(torrent_handle)
did not match C++ signature:
    file_progress(libtorrent::torrent_handle {lvalue}, libtorrent::flags::bitfield_flag<unsigned char, libtorrent::file_progress_flags_tag, void> flags=0)
/usr/lib/python3.9/site-packages/deluge/ui/gtk3/files_tab.py:296:update
/usr/lib/python3.9/site-packages/deluge/ui/sessionproxy.py:147:get_torrent_status
/usr/lib/python3.9/site-packages/deluge/ui/client.py:551:__call__
/usr/lib/python3.9/site-packages/deluge/ui/client.py:500:call
AllSeeingEyeTolledEweSew commented 3 years ago

It looks like we removed support for torrent_info() constructors from bytearray and memoryview.

That's fine, I expect no one ever used them. They were only on the list because it appeared this might have been supported at one time.

I removed them from the list

cas-- commented 2 years ago

@arvidn Is this in the wrong milestone?

Deluge really needs the file_progress_flags fix #6369 for lt 2.0