arvidn / libtorrent

an efficient feature complete C++ bittorrent implementation
http://libtorrent.org
Other
5.15k stars 994 forks source link

Bug: Error during torrent generation #6031

Closed master255 closed 2 years ago

master255 commented 3 years ago

libtorrent version (or branch): 2.0

platform/architecture: Android

@arvidn Does anyone use torrent 2.0 at all? It's all crashing and doesn't work for me.

This code: Vectors.byte_vector2bytes((new create_torrent(founded.torrent_file_ptr())).generate().bencode())

Always gives this error: java.lang.RuntimeException: missing or invalid 'hash' entry at org.libtorrent4j.swig.libtorrent_jni.create_torrent_generate(Native Method) at org.libtorrent4j.swig.create_torrent.generate(create_torrent.java:56)

founded.is_valid() = true.

arvidn commented 3 years ago

assuming founded is a torrent_handle, and the torrent_file_ptr() is torrent_handle::torrent_file()

The documentation states:

Note that the torrent_info object returned here may be a different instance than the one added to the session, with different attributes like piece layers, dht nodes and trackers. A torrent_info object does not round-trip cleanly when added to a session.

Specifically, in this case the piece-layers will have been stripped from the torrent_info object, which means the create_torrent constructor cannot create a torrent from it. It doesn't have the piece layers.

You can get the piece layers (here), but there's no constructor where you can pass those in separately. There probably should be.

master255 commented 3 years ago

@arvidn It worked in the 1.2 library. But it doesn't work in 2.0. I need to get the .torrent file from the torrent_handle. How can I do this?

arvidn commented 3 years ago

There should be a utility function that does that.

The short term fix is to add the piece layers manually before calling generate(). An example of how to do this is in the create_torrent constructor. Note how that's in a loop over all files in the torrent.

To get the piece layers you call torrent_handle::piece_layers().

If the reason you need to create a .torrent file is to save resume data, using write_resume_data() on the add_torrent_params you get from the save_resume_data_alert is probably a better solution.

master255 commented 3 years ago

@arvidn Before this code, I add the torrent to the torrent session like this:

AddTorrentParams addTorrentParams = new AddTorrentParams();
                                    addTorrentParams.setFlags(TorrentFlags.APPLY_IP_FILTER);
                                    addTorrentParams.setTrackers(Constants.trackers);
                                    addTorrentParams.setSavePath(playlistsFolder);
                                    info_hash_t infoHashT = new info_hash_t(sha1Hash);
                                    addTorrentParams.swig().setInfo_hashes(infoHashT);
                                    error_code errorCode = new error_code();
                                    founded = torrentSession.add_torrent(addTorrentParams.swig(), errorCode);

It turns out I have a torrent_handle and torrent_info inside, as I check if the torrent is downloaded before like this: founded.status().getIs_finished() And I should be able to save the .torrent file after downloading from torrent_info. You have to restore this functionality. Without it, nothing will work. This worked fine in the 1.2.12 version of the library, but it doesn't work now.

arvidn commented 3 years ago

it's still not clear to me what the use case of creating a .torrent file is. in order to improve this "workflow", it would be really useful to know what you use the .torrent file for.

I ask because when downloading a v2 torrent, there's a lot more important state to keep, other than the info-dictionary (which is the only part you get from the metadata-extension). Specifically, the (possibly partial) merkle trees. These are saved in the resume data with write_resume_data(), and loaded again with read_resume_data() and add_torrent().

arvidn commented 3 years ago

would the ideal interface be to have a function that turns a torrent_handle into a .torrent-file buffer.

master255 commented 3 years ago

@arvidn Doesn't torrent_info store all the information needed to create a .torrent file? Isn't it enough to have a torrent_handle and torrent_info to create a torrent file?

I don't want something new. I want it to work the way it did in the 1.2.12 version of the library.

arvidn commented 3 years ago

@arvidn Doesn't torrent_info store all the information needed to create a .torrent file?

Only before it's been added to a session. Once it's been added to the session, the piece layers are removed from it. This is an unfortunate hack necessary to not store the trees twice in memory. In an ideal world, torrent_info would only include the information from the info-section, and add_torrent_params would contain it as well as everything outside of the info-dictionary. I'm hoping to slowly move things in that direction.

Isn't it enough to have a torrent_handle and torrent_info to create a torrent file?

For v1 torrents, it is, but not for v2 torrents. This is because v2 torrents don't use piece-hashes, but hash-trees that are mutable during download, so it can't be shared with the client via the torrent_infoobject.

master255 commented 3 years ago

@arvidn Arvin. You talk a lot and don't do much. This bug happens on v1 version of torrents. Please fix this bug.

arvidn commented 3 years ago

You talk a lot and don't do much.

Someone keeps me busy by demanding I explain things that could be learned by reading about it on the internet :)

master255 commented 3 years ago

@arvidn Arvin, I have this bug. Is there enough information here to reproduce it and fix it?

master255 commented 3 years ago

@arvidn Will this be fixed?

arvidn commented 3 years ago

probably not. This is not intended to work. Do you think it's important to work? If so, can you please provide a use case.

master255 commented 3 years ago

@arvidn This is the basic scenario. I'm adding a torrent. It downloads. Then I want to share the torrent file with a friend. But I get an error because the library can't do that.

master255 commented 3 years ago

@arvidn I figured out how to reproduce this bug. The bug only happens when I downloaded v2 torrent and try to generate a torrent file from its data after the data is fully downloaded. It looks like v1 hash is missing and torrent generation does not work because of it.

https://mega.nz/file/x98iGDzA#vbDokLWyQjR9w0hoz2Y7lcEQG_tveUkNDt0RPLFSXes

arvidn commented 3 years ago

I would expect the problem being that the torrent_info object you get back from the torrent_handle does not include the merkle trees for v2 torrents. Which means you have to add those separately.

I'm not sure what the best solution would be. Putting the merkle trees/piece layers in the torrent_info object would mean it would all have to be copied, both the torrent_info object as well as the piece layers.

I also don't think you can create a v2 torrent file immediately when the metadata is received (as you can for v1 torrents). This is not your scenario since you download it first, but it's a similar scenario.

There could be a separate, free function, that turns a torrent_handle into a create_torrent object, which figures out the piece layers from the merkle trees. The problem with that is it would have to make blocking calls to the torrent_handle, to get extra information. This makes it a bit of a messy function.

Alternatively, there could be a new constructor to create_torrent that takes the merkle trees as a separate parameter. That way, the blocking call is still made by the caller.

master255 commented 3 years ago

@arvidn Can you tell me why we need a merkle trees and a new hash?

arvidn commented 3 years ago

v2 torrents use merkle trees instead of a flat list of piece hashes. The full trees are not part of the metadata, just the merkle root for each file is.

You can compute the full merkle trees once you have the files, and you'll download parts of the tree as you go, to validate pieces. So once you're a seed you also have the hashes, but until then, you don't necessarily. v2 torrents keep the hashes outside of the info-dictionary, which means you don't download them all up-front.

There are more details in this blog post: https://blog.libtorrent.org/2020/09/bittorrent-v2/

master255 commented 3 years ago

@arvidn

I also don't think you can create a v2 torrent file immediately when the metadata is received (as you can for v1 torrents).

Why? Doesn't the metadata contain merkle trees? And why, if not?

arvidn commented 3 years ago

Why? Doesn't the metadata contain merkle trees? And why, if not?

No. for v2 torrents the metadata (really, the info dictionary) only contains the merkle tree roots. The reason is to save space and make the metadata smaller. It enables larger torrents with less up-front overhead for downloading the metadata.

i.e. magnet links can start up earlier.

master255 commented 3 years ago

@arvidn I understand it right? Now there is no scenario at all with which I can use info_hashV2 to create a .torrent file?

arvidn commented 3 years ago

I understand it right? Now there is no scenario at all with which I can use info_hashV2 to create a .torrent file?

No, that is not a correct understanding.

A torrent_info loaded from a .torrent file always contains the hashes (just not in the info-dictionary), until you add it to a session. You can use that object to make a copy. This could make sense to modify a torrent in some way. Adding or removing trackers for instance.

The thing that's new is that a torrent_info object you get back from a torrent_handle, i.e. after you've added it to ta session, will not contain v2 hashes. However, the v2 hashes still exists internally, and you can ask for them. You may not have all of them unless you've downloaded the whole file, in which case you cannot create a new .torrent file based off it.

Right now, the way you would create a new create_torrent object from a torrent_handle that's seeding (i.e. it has all the pieces and v2 hashes) is you would first create it from the torrent_info object, then ask for the v2 hashes and add those too (to the create_torrent object).

master255 commented 3 years ago

@arvidn

I also don't think you can create a v2 torrent file immediately when the metadata is received (as you can for v1 torrents).

Will saving the torrent_handle state when exiting the program work correctly in this case? Will the state of the merkle trees be saved? It looks like we now have a .torrent file that stores the progress of the data download?

There could be a separate, free function, that turns a torrent_handle into a create_torrent object, which figures out the piece layers from the merkle trees.

I think we need a flag for that. And if the flag is active, Libtorrent should save merkle trees in create_torrent. With blocking calls. This is normal, but it is desirable to exclude blocking calls. Maybe create a separate copy of merkle trees when loading data. Maybe with a flag.

Alternatively, there could be a new constructor to create_torrent that takes the merkle trees as a separate parameter. That way, the blocking call is still made by the caller.

This is not the right way out.

Putting the merkle trees/piece layers in the torrent_info object would mean it would all have to be copied, both the torrent_info object as well as the piece layers.

Maybe this method "torrent_handle.torrent_file_ptr(true)" should pass true, which would mean getting torrent_info with merkle trees?

arvidn commented 3 years ago

Will saving the torrent_handle state when exiting the program work correctly in this case?

Yes, if you do it the way you're supposed to, with the resume data API.

Will the state of the merkle trees be saved?

Yes, if you use resume data.

It looks like we now have a .torrent file that stores the progress of the data download?

No, it's not saved in the .torrent file.

arvidn commented 3 years ago

Maybe this method "torrent_handle.torrent_file_ptr(true)" should pass true, which would mean getting torrent_info with merkle trees?

Yes, this is probably the simplest solution.

Keep in mind that it would still not work until the torrent is a seed. Only then do you have all the piece hashes.

arvidn commented 3 years ago

I think this would work. Note that the existing torrent_file() cannot be made to return a copy, as its signature already returns an immutable reference to the internal object.

https://github.com/arvidn/libtorrent/pull/6083

master255 commented 3 years ago

@arvidn @aldenml Okay. I need to check it out. I need a zip file of the library with this patch or Alden's help.

aldenml commented 3 years ago

@master255 I added the changes to the branch test-patch1 you have been testing, with a git pull you should be able to test.

Screen Shot 2021-03-21 at 12 37 38 PM Screen Shot 2021-03-21 at 12 37 18 PM
master255 commented 3 years ago

@arvidn No. It's not working. https://mega.nz/file/55VB2CSB#gDGRIXhq-XuHxGBY2oQ6RTGdAWIV8XB1LbeGd7v0XX8

arvidn commented 3 years ago

what's failing?

master255 commented 3 years ago

@arvidn Have you seen the video?

arvidn commented 3 years ago

no, I hadn't. I just did though. My question remains, what's failing? To elaborate, please take a look at the unit test I added: https://github.com/arvidn/libtorrent/pull/6083/files#diff-297d163cb097bf7c10d800abb020623c944b67658cacc88f3265bc87e61c1941R1284

That works. Are you doing anything different than that test? Can you describe the difference in prose or as a patch against the unit test that demonstrates the failure?

master255 commented 3 years ago

@arvidn Yes. I do it differently. I create a v2 torrent file on phone 1 and distribute it. On phone 2 I download it completely and try to generate a torrent file from torrent_handle. An error appears.

arvidn commented 3 years ago

I added tests that create a torrent from the session that downloaded it (rather than the session it was added to) and they all still pass.

https://github.com/arvidn/libtorrent/pull/6083/files#diff-ba9b653cb9a2fb274653f998f971ec64d62e57be8ed13a91539ac340546c6218R155

arvidn commented 3 years ago

@master255 instead of digging up information about the torrent_handle it would be much more interesting to see information about the torrent_info object you get back from torrent_file_with_hashes().

For instance, what does piece_layer(0) return?

arvidn commented 3 years ago

does this fix it? https://github.com/arvidn/libtorrent/pull/6153

master255 commented 3 years ago

@arvidn I got a little sick. I need 1-2 weeks to check it out.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

master255 commented 3 years ago

I need more time

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

master255 commented 2 years ago

Temporarily not actual