arvidn / libtorrent

an efficient feature complete C++ bittorrent implementation
http://libtorrent.org
Other
5.15k stars 994 forks source link

Bug: Wrong total size of files #6023

Closed master255 closed 3 years ago

master255 commented 3 years ago

@arvidn libtorrent version (or branch): 2.0

platform/architecture: Android

I get the total size as follows:

handle.torrent_file_ptr().total_size() / 1048576

If I create a torrent with the create_torrent.v1_only flag, the size is the same as on the disk 750Mb, but if I use create_torrent.v2_only, the size becomes 1.5 times larger. 1139Mb

Which flag should I use? create_torrent.v1_only or create_torrent.v2_only

Screenshot_20210303-133126_Media Library

arvidn commented 3 years ago

the total size of the torrent is not meant to reflect the sum of the size of the files that are saved to disk. It include pad-files used to align the files to piece boundaries.

arvidn commented 3 years ago

assuming you're making a torrent from more than 1 file, I would expect the v1_only torrent to also have a size larger than the sum of the files. It also inserts pad files, they are just generally smaller for v1 compared to v2.

master255 commented 3 years ago

@arvidn Okay. How do I get just the size of the torrent files?

master255 commented 3 years ago

@arvidn

assuming you're making a torrent from more than 1 file, I would expect the v1_only torrent to also have a size larger than the sum of the files. It also inserts pad files, they are just generally smaller for v1 compared to v2.

No! It's a bug. If I create a torrent v1, the size is exactly 750Mb. It is exactly the same size as the downloaded one.

The problem is only with v2 torrents.

arvidn commented 3 years ago

No! It's a bug.

You repeating it doesn't make it so. I already told you it's intentional.

If I create a torrent v1, the size is exactly 750Mb. It is exactly the same size as the downloaded one.

Perhaps all your file sizes are divisible by the piece size. If all your files were a power-of-two multiple of the piece size, the same would be true for v2 torrents.

arvidn commented 3 years ago

it seems the current logic does not add pad files for v1_only torrents

arvidn commented 3 years ago

clarified documentation here: https://github.com/arvidn/libtorrent/pull/6025

master255 commented 3 years ago

@arvidn

it seems the current logic does not add pad files for v1_only torrents

Then why is it done for v2 torrents?

master255 commented 3 years ago

@arvidn Can you make the logic of this parameter the same for v1 torrents and for v2? As all developers expect.

arvidn commented 3 years ago

Can you make the logic of this parameter the same for v1 torrents and for v2? As all developers expect.

I'm pretty sure most people would expect these flags to have meanings and affect the resulting torrent. If both flags do the same thing, there would be no point of having them.

arvidn commented 3 years ago

Then why is it done for v2 torrents?

Because v2 torrents require all files to be piece-aligned. https://blog.libtorrent.org/2020/09/bittorrent-v2/

master255 commented 3 years ago

@arvidn Okay. When downloading files using torrents v2 will I need 750MB of free space? Therefore the full size will be 750MB? Not 1139? What does total_size() show? Or will I need 1139MB of free space?

master255 commented 3 years ago

@arvidn Which flag do you recommend using? v1 or v2? What are the advantages and disadvantages of these flags? Perhaps I should let the user choose for himself which version to use? Or not?

master255 commented 3 years ago

@arvidn Perhaps then you should include in the calculation of downloaded data - the size of .pad files? Otherwise we get a discrepancy in the numbers. Downloaded (Total_done) 750, but the full size (total_size()) is 1139 and Progress_ppm = 100%

But then you will have problems calculating the download speed.

master255 commented 3 years ago

@arvidn Arvid, this is new technology. You have to answer all the questions.

arvidn commented 3 years ago

Pad files are not stored on disk. They are just zero-padding. They don't contribute towards how much free space you need. I think defaulting to hybrid (v1 + v2) makes sense. It may also make sense to let users chose, I think mostly as an escape-hatch in case there's some reason hybrid torrents don't work in some scenarios.

Downloaded (Total_done) 750, but the full size (total_size()) is 1139 and Progress_ppm = 100%

This is not a new issue btw. pad-files have been in use for a good 15 years, and supported in major bittorrent clients for 7 years or so.

Pieces (or files) that have priority 0 (i.e. won't be downloaded) cause a similar problem. This, and the pad-file issue, is solved together by reporting total_wanted and total_wanted_done separately from the total and total_wanted (the latter including files with prio 0 and pad-files).

But then you will have problems calculating the download speed.

The download speed is computed based on the number of bytes received over the network. I would not recommend inferring it from the rate at which progress increases.

master255 commented 3 years ago

@arvidn I have the .torrent files on disk. I need to display their size to the user without adding them to the torrent session. How do I do this? I am currently using torrent_info. But according to you it displays the size with .pad files. How can I count size without .pad files using torrent_info or without adding them to torrent session?

master255 commented 3 years ago

@arvidn Will this be fixed?

master255 commented 3 years ago

What will happen to such torrents? Gray means it is not added to the torrent session.

Screenshot_20210307-213944_Media Library

arvidn commented 3 years ago

I need to display their size to the user without adding them to the torrent session. How do I do this?

I think operator+ would be appropriate.

Will this be fixed?

Assuming "this" refers to the current definition of torrent_info::total_size(), no. This is not a bug. I've tweaked the documentation to avoid misunderstandings: https://github.com/arvidn/libtorrent/blob/RC_2_0/include/libtorrent/torrent_info.hpp#L343

arvidn commented 3 years ago

What will happen to such torrents?

What kinds of torrents?

master255 commented 3 years ago

@arvidn

I think operator+ would be appropriate.

What does that mean? Add up the size of the files? I have over 100 .torrent files. Each one could have 100 files (+100 pad files) in it. Right now Libtorrent counts file sizes. Are you suggesting that I count them?

arvidn commented 3 years ago

What does that mean? Add up the size of the files? I have over 100 .torrent files. Each one could have 100 files (+100 pad files) in it. Right now Libtorrent counts file sizes. Are you suggesting that I count them?

Not so much "count" them, but sum the sizes of the files you're interested in.

master255 commented 3 years ago

@arvidn That's too many calls to the library. It won't work.

arvidn commented 3 years ago

how long does a call take? and do you understand why it takes so long?

master255 commented 3 years ago

@arvidn I didn't measure it. But if it is 100 files, I will need to convert it to TorrentInfo first. Then get torrent_storage. Then list the names of the 200 files. Then find ./pad in each name. Then depending on that, get the file size from torent_storage another 100 times... A total of 302 calls per file. * 100 = 3000 calls. I forgot to say. I have a screen update every second. So there will be 3000 calls every second. ++ calls to get data about active torrents. ++ calls for streaming and downloading torrents.

Your library will explode :-D

Okay, I'm caching the file data, but the first time it will cause a significant delay in displaying the files.

arvidn commented 3 years ago

I wouldn't recommend doing it like that. There's a flag indicating whether a file is a pad file or not, don't do string comparisons.

So, if you're calling this every frame, are you also doing: "convert it to TorrentInfo first. Then get torrent_storage" every frame today as well, in order to call total_size()? I wouldn't recommend doing that every frame either. Just cache the information you're showing in the UI.

master255 commented 3 years ago

@arvidn Yes. Now I convert the torrent file to torrentInfo and then get the total_size(). And that's it. 2 calls! You suggest making 302 calls instead of 2. This is very bad.

I don't know how fast it works, but every torrent file is converted to byte_vector. So to convert a torrent file requires a call on each byte of the file. That is thousands of calls. This, too, can be optimized. Create TorrentInfo based on the file path.

Upd: I found a constructor torrent_info(String filename, error_code ec) For some reason no one uses it. This will greatly increase the performance of my torrent screen.

arvidn commented 3 years ago

Yes. Now I convert the torrent file to torrentInfo and then get the total_size(). And that's it. 2 calls! You suggest making 302 calls instead of 2.

No, I suggested not doing any calls per frame. i.e. amortized 0 calls

I don't know how fast it works, but every torrent file is converted to byte_vector. So to convert a torrent file requires a call on each byte of the file. That is thousands of calls. This, too, can be optimized. Create TorrentInfo based on the file path.

I'm not following, but it does not sound right to do anything with byte buffers just to query the size.

master255 commented 3 years ago

@arvidn

I'm not following, but it does not sound right to do anything with byte buffers just to query the size.

Just look at this. I think everyone needs to fix it. @gubatron @proninyaroslav

We have a constructor torrent_info(String filename, error_code ec)

https://github.com/frostwire/frostwire/blob/5ff078a61645b19d70c97782d6345a1e1957b221/common/src/main/java/com/frostwire/bittorrent/BTEngine.java#L285

https://github.com/proninyaroslav/libretorrent/blob/168d62fee812fb11abafa59efc5afbddbf086ecf/app/src/main/java/org/proninyaroslav/libretorrent/core/model/session/TorrentSessionImpl.java#L540

No, I suggested not doing any calls per frame. i.e. amortized 0 calls

The initial loading of the screen will be very long. Arvin, just fix it. There are actually a lot of scenarios.

arvidn commented 3 years ago

The initial loading of the screen will be very long.

I don't see a reason why me calling operator+ would be any faster than you doing so.

arvidn commented 3 years ago

https://github.com/arvidn/libtorrent/pull/6042

master255 commented 3 years ago

@arvidn Thank you. We're getting closer to the release of Libtorrent 2.0

master255 commented 3 years ago

@arvidn You forgot to add logic to the variable. https://mega.nz/file/4xEgGTRB#tFEktgyyTk3f8flLsSIv2DpKd2Ve8aNri0WYezCUZK8

arvidn commented 3 years ago

what does that mean? There are test cases demonstrating it working.

master255 commented 3 years ago

@arvidn https://github.com/arvidn/libtorrent/pull/6042/files#r594326015

Where is the logic?

arvidn commented 3 years ago

Where is the logic?

in the source code.

What are you expecting to see that you don't see? Do you experience a case where you get an incorrect result?

master255 commented 3 years ago

@arvidn I expect the numbers total_size and size_on_disk to be different. Why else would you do it? And size_on_disk should be equal to status.getTotal_done()

arvidn commented 3 years ago

as the unit tests demonstrate, the are different. But only if the torrent has pad files

master255 commented 3 years ago

@arvidn Check this file 096c5154c77cc3343d895efa2a5f1b583388d3ef.zip

master255 commented 3 years ago

@arvidn https://github.com/arvidn/libtorrent/pull/6042/files#r594390243

master255 commented 3 years ago

@arvidn My code:

torrent_handle_vector torrents = torrentSession.get_torrents();
torrent_handle handle = torrents.get(1);
torrent_status status = handle.status();
status.getTotal_done()  =143Mb
handle.torrent_file_ptr().size_on_disk()  = 191Mb

I see this as a difficult problem. There are too many different things going on.

And I execute this code every second to display the information to the user.

arvidn commented 3 years ago

there are more efficient ways of getting that information. Especially if you have many torrents. use post_torrent_updates() instead.

master255 commented 3 years ago

@arvidn There is new information. Apparently it's not about calculating file sizes. The problem is in the torrent file creation. Library 1.2.13 shows the same size: 143/191 as the 2.0 version. For this (096c5154c77cc3343d895efa2a5f1b583388d3ef) torrent file. When I generate a torrent file with version 1.2.13 and builder.padFileLimit(-1) I get 137/137 size. (hash ba3c09fa3aea9c602bf7dd7c997690f26eafbcde) When I generate a torrent file with version 1.2.13 and builder.padFileLimit(0) I get 143/191 size. (hash 2c0ea42e849558788a865b14dabeb921bc9289fb) When I generate a torrent file with version 2.0 I get 143/191 size. (hash 096c5154c77cc3343d895efa2a5f1b583388d3ef) Why are the hashes different? Why does version 2.0 display a different file size than version 1.2.13? And why the second version of the library generates wrong torrent file? I generate it with flags: v1 and cannonical? 1.zip

arvidn commented 3 years ago

Why are the hashes different?

Because the info dictionaries are different

Why does version 2.0 display a different file size than version 1.2.13?

I don't know. How do you get the file size?

And why the second version of the library generates wrong torrent file? I generate it with flags: v1 and cannonical?

What's wrong about it?

master255 commented 3 years ago

@arvidn

Because the info dictionaries are different

Why? The directory of files is completely the same, isn't it? Ok, I understand that the difference is because of pad files, but I generate pad files in both 1.2.13 and 2.0 versions the same way. The hash should match.

I don't know. How do you get the file size?

torrent_handle_vector torrents = torrentSession.get_torrents();
torrent_handle handle = torrents.get(1);
torrent_status status = handle.status();
status.getTotal_done()  =143Mb
handle.torrent_file_ptr().size_on_disk()  = 191Mb

What's wrong about it?

The sizes don't match.

arvidn commented 3 years ago

Why? The directory of files is completely the same, isn't it? Ok, I understand that the difference is because of pad files, but I generate pad files in both 1.2.13 and 2.0 versions the same way. The hash should match.

You can dump the .torrent file (the info dictionary specifically) to find out. I would encourage you to do that, to get a sense of what's being hashed.

The order of files and the rules for pad files, and really a lot of other subtle details, go into the info-dictionary. If it's important to you that the torrents are identical, you should make sure you make the exact same calls in the same order on the create_torrent object.

The sizes don't match.

Which sizes? and the please also explain why you would expect those sizes to be the same.

master255 commented 3 years ago

@arvidn

Which sizes? and the please also explain why you would expect those sizes to be the same.

torrent_handle_vector torrents = torrentSession.get_torrents(); torrent_handle handle = torrents.get(1); torrent_status status = handle.status(); status.getTotal_done() =>>143Mb << handle.torrent_file_ptr().size_on_disk() =>> 191Mb<<

These sizes. I want them to be the same. Why doesn't the downloaded data match the files in the torrent file? How can this be? Didn't we remove the pad files?

And it happens immediately after creation of torrent and status shows that torrent was successfully downloaded and distributed.

master255 commented 3 years ago

@arvidn And why the 1.2.13 version of the library shows a size of 137/137Mb And library 2.0 shows size 143/191 For the same files. I just checked it. The correct size is 137Mb. Library 2.0 does not show the correct file size at all. And it's because of pad files. But it's not right? Isn't it? And your patch didn't help.

arvidn commented 3 years ago

https://github.com/arvidn/libtorrent/pull/6065