arvidn / libtorrent

an efficient feature complete C++ bittorrent implementation
http://libtorrent.org
Other
5.23k stars 992 forks source link

Allow to add torrent with parsed (bdecoded) fastresume data #126

Closed glassez closed 4 years ago

glassez commented 9 years ago

Since we store some additional values in fastresume data, we then decode the data to extract these values. If libtorrent accepted to get already decoded data, it would help to avoid repeated decoding.

sledgehammer999 commented 9 years ago

When @glassez says "we" he means qbittorrent :)

arvidn commented 9 years ago

This used to be the case, back when adding torrents always was done synchronously, and a pointer to an entry could be passed in. The problem with that was that synchronous add is very expensive when adding lots of torrents, so it would have to be copied. The problem with that is that entry is very expensive to copy. A plain buffer is much better for that. However, today, with bdecode_node, it would be cheap to copy that, along with the plain buffer.

There is a potential API challenge in making sure that the provided bdecode_node actually points into the resume buffer, and also that it's the root node of it (i.e. that it owns the parse state). Any ideas of how that could be solved? (for an introduction to bdecode_node, see http://blog.libtorrent.org/2015/03/bdecode-parsers/)

sledgehammer999 commented 9 years ago

Is lazy_entry also expensive to copy?

arvidn commented 9 years ago

yes, fairly. It still has the same problem though. It just points into a vector.

actually, it's a bit broader than that. if a bdecode_node or lazy_entry could be added to the add_torrent_params object, that object would not be copyable anymore, because the parsed structure would still point to the old buffer. Maybe there needs to be a type that combines bdecode_node and a std::vector<char>.

glassez commented 9 years ago

IMO, The best solution is to use copy-on-write strategy for classes like libtorrent::entry.

arvidn commented 9 years ago

yeah, that would make a lot of sense. although, even avoiding the copying of libtorrent::entry, just creating it is pretty expensive ( http://blog.libtorrent.org/2015/03/bdecode-parsers/ )

glassez commented 9 years ago

New bdecoding and implementation of copy-on-write will perfectly complement each other. However, I have some thoughts about new (and lazy) bdecoding:

  1. The fact that it uses the source buffer, it is very inconvenient for user side (I hope no need to explain why). If copying data to entry own buffer, it doesn't degrade performance, but it would greatly increase the usability.
  2. The fact that usually the buffer is no longer used after the decoding, gives us the opportunity not even to copy it, but just let entry to take its ownership (e.g. swap into entry own buffer).
arvidn commented 9 years ago

Those point, although true from the user's point of view, are not true internally in libtorrent. lazy_bdecode came about were two reasons:

(1) loading large .torrent files would use more twice the amount of RAM as the .torrent file itself (I count that as degrading performance). (2) copying the buffer before parsing it has a performance hit, especially when parsing potentially large .torrent files. It's also expensive when parsing incoming DHT packets because there are many of them.

In the case of DHT packets, both the packet buffer and the decoded form of it are thrown away at the same time, when the packet has been handled and it's time to receive the next packet into the buffer.

That said, the way torrent_info deals with this is to combine bdecode_node with a boost::shared_array<> to the underlying buffer. It doesn't do copy-on-write unfortunately, but it copies and "rebases" the parsed state onto the new copy of the buffer (there is built-in support for that). Given that .torrent files are immutable, it could probably not even bother copying the buffer, but just increment its refcount.

glassez commented 9 years ago

My first suggestion was only a prerequisite to the second. My main idea is to encapsulate the buffer in entry class, but not to copy the source buffer into it, just move it there. Although I have not studied libtorrent code in detail, theoretically I see no reason why it can't be implemented here.

aldenml commented 9 years ago

Hi @arvidn, do you think it is a good idea to simply add another "parallel" field decoded_resume_data of type decode_node to add_torrent_params? (this is similar to the set ti, url, info_hash).

In terms of priority, session_impl should consider first decoded_resume_data if it exists (since it's already decoded).

arvidn commented 9 years ago

something like that, yes. It probably has to be something a bit more sophisticated though. For synchronous add_torrent it's fine, but for async_add_torrent() the add_torrent_params object has to be copied, and it would be nice to have a bdecode_node variant that could own the underlying buffer, perhaps held in a shared_ptr. I'm not sure what the most elegant approach is though. A new type that wraps bdecode_node + a buffer?

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.