atomashpolskiy / bt

BitTorrent library and client with DHT, magnet links, encryption and more
https://atomashpolskiy.github.io/bt/
Apache License 2.0
2.41k stars 381 forks source link

[QUESTION] Why do process handles keep increasing. #185

Open yjs112233 opened 3 years ago

yjs112233 commented 3 years ago

[describe] --version , demo download from this. --start from bt.cli.CliClient. --After running multiple instances[new CliClient()] in the same process and all in shutdown, the handle still exists. Maybe somewhere forgot to close stream. --other. A large number of connections were found in CLOSE_WAIT state.

If this is a problem, how can I solve it to keep the program running consistently? Thanks.

pyckle commented 3 years ago

Hey @yjs112233

CliClient is designed to be used from the command line, not programatically. The runtime shutdown is disabled: https://github.com/atomashpolskiy/bt/blob/7a141ed42b82af30663354db4e8fa8be2364bb4e/bt-cli/src/main/java/bt/cli/CliClient.java#L139

Is there a reason why you're using the CliClient class rather than directly integrating with BtClient/BtRuntime?

yjs112233 commented 3 years ago

@pyckle

Thank you very much for paying attention to this problem that has bothered me for several months. In fact, I've been using BitTorrent for almost a year. At first, I integrated it into my application to receive BT file parsing and download. However, after integration, I found that the same BT file cannot be received in the same BtRuntime, and it will stop running because it has been registered. Therefore, I changed the integration mode. The parsing of a BT file will start a BtRuntime and mount a unique BtClient (even though BtRuntime can attch many BtClients in design). After the BT download is completed, let it close itself. According to this idea, on the BtRuntime configuration, do not use .disableAutomaticShutdown() so that it will shutdown automatically when it is completed. Each BtRuntime uses the same pair of port numbers 6891 and 49001. Here is my BtRuntime configuration:

public BtRuntime getBtRuntime(){
BtRuntimeBuilder runtimeBuilder = getBtRuntimeBuilder(6891,49001);
BtRuntime runtime = runtimeBuilder.build();
runtime.startup();
return runtime;
}

private BtRuntimeBuilder getBtRuntimeBuilder(int port, int dhtPort){
    return BtRuntime.builder(this.getConfig(port))
                                 .module(this.getDHTMpdule(dhtPort))
                                 .autoLoadModules();
}
private Config getConfig(int port){
Config config = new Config() {
@OverRide
public InetAddress getAcceptorAddress() {
return super.getAcceptorAddress();
}

        @Override
        public int getAcceptorPort() {
            return port;
        }

        @Override
        public int getNumOfHashingThreads() {
            return Runtime.getRuntime().availableProcessors() * 2;
        }

        @Override
        public EncryptionPolicy getEncryptionPolicy() {
            return EncryptionPolicy.REQUIRE_ENCRYPTED;
        }
    };
    return config;
}
private Module getDHTMpdule(int dhtPort){
Module dhtModule = new DHTModule(new DHTConfig() {
@OverRide
public int getListeningPort() {
return dhtPort;
}

        @Override
        public boolean shouldUseRouterBootstrap() {
            return true;
        }
    });
    return dhtModule;
}

Here is my BtClient configuration:

public BtClient torrentClient(Torrent torrent,String id, List fileList, BtRuntime btRuntime){
if (fileList == null || fileList.isEmpty()){
throw new TorrentException(TorrentResultEnum.SELECTOR_IS_NULL);
}
String magnet = TorrentUtils.toMagnetLink(torrent);
magnet = addLatestTrackers(magnet);
magnet = TorrentUtils.magnetTrim(magnet);
BtFileSelector selector = new BtFileSelector(fileList);
BtClientBuilder builder = Bt.client(btRuntime)
.fileSelector(selector)
.magnet(magnet)
.storage(this.getStorage(id))
.randomizedRarestSelector();
return builder.build();
}

At first everything was normal. And it can support multiple tasks at the same time. However, with the gradual increase of tasks, more and more too many open files exceptions appear in my application. For further confirmation, I only used bt.cli.cliclient. To reproduce this problem, and found that it was the same. Back to the above question, is it my configuration error or a bug in BitTorrent.

yjs112233 commented 3 years ago

I use BT version 1.9 from maven repository, the self-contained org. Eclipse. Jetty is excluded. And additional integration with org.eclipse.jetty version: 8.2.0.v20160908;

atomashpolskiy commented 3 years ago

Sorry, it's not entirely clear from your message -- is there only one instance of runtime/client active at each given time? I.e. are you saying that the runtime leaves files/connections open after having shut down? Because in the case of multiple runtimes/clients you can easily hit the OS limit on the number of simultaneously open file descriptors and will have to increase this limit or re-think your application's pipeline.

On Tue, 5 Oct 2021, 14:58 yjs112233, @.***> wrote:

I use BT version 1.9 from maven repository, the self-contained org. Eclipse. Jetty is excluded. And additional integration with org.eclipse.jetty version: 8.2.0.v20160908;

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/atomashpolskiy/bt/issues/185#issuecomment-934387737, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA4TJBVP5D3LO4QNGHPBYILUFLY6FANCNFSM5CTTJWNQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

yjs112233 commented 3 years ago

is there only one instance of runtime/client active at each given time? -- yes ! But there may be multiple instances at the same time. eg: Here is a file named A.torrent. My application will get a BtRuntime instance through the method .getBtRuntime(), create a BtClient with this BtRuntime instance, and start parsing and downloading the contents of A.torrent. At the same time, B.torrent will get a new BtRuntime and a new BtClient in the same way, and start parsing and downloading the content of B.torrent. C.torrent 、D.torrent、more and more.... Each BtRuntine is configured to allow automatic shutdown. When the download is complete, I confirm through the log that it has been closed,

End: After running some torrent files like this, and wait until all BtRuntime instance has been closed, A large number of connections were found in CLOSE_WAIT state. If more torrent files are run, the application will throw too many open files.

pyckle commented 3 years ago

However, after integration, I found that the same BT file cannot be received in the same BtRuntime, and it will stop running because it has been registered.

If this is the case, using the latest snapshot build should fix your issue. This is the same bug as https://github.com/atomashpolskiy/bt/issues/146 and I believe this was fixed https://github.com/atomashpolskiy/bt/commit/1ad9d4caf49d89894712aa7f57085b88b2797505

Looking through the code briefly, I noticed the following

Neither of these address your issue, but they probably should be fixed. I would suggest as a next step to better understand this issue

yjs112233 commented 3 years ago

thanks !

yjs112233 commented 2 years ago

Hi!I am back. I downloaded the latest code and just added the code of #193, Now the application can work persistently. It looks as if the leak has been fixed, The number of close_waits becomes normal. On question #146, It does solve the problem of duplicate registrations.The fly in the ointment,The same TorrentId does not support running simultaneously in the same BtRuntime. If it does, BT will be better off.

pyckle commented 2 years ago

Now the application can work persistently. It looks as if the leak has been fixed, The number of close_waits becomes normal.

Excellent (:

The fly in the ointment,The same TorrentId does not support running simultaneously in the same BtRuntime. If it does, BT will be better off.

Can you explain why you have this use case? Why do you want to download a torrent multiple times in the same runtime?

From a protocol perspective, this can't be implemented because if an incoming peer connects to the torrent, the runtime won't know which instance to associate it with.

yjs112233 commented 2 years ago

If BT is used as an service, two different people may want to download the same Torrent and it may happen at the same time. If the protocol does not satisfy this idea, it may be implemented only through creating a new BtRuntime

pyckle commented 2 years ago

There are two solutions that I see for this use case.

The second solution is technically superior, but more coding, and some bt changes. If there are additional tracker(s) (announce/announce-list) in the torrent loaded second, those should be somehow added to the torrent which is downloading. Some of the work has been done for this PeerRegistry.addPeerSource(), but more work would need to be done to allow this API to be easily externally accessible.

Note, this second solution won't work properly with private trackers that track download/upload amounts because they often put a unique identifier in the announce url.

yjs112233 commented 2 years ago

Thank you for helping me solve these lssues !

pyckle commented 2 years ago

Sure :+1:

atomashpolskiy commented 2 years ago

Another way would be to implement a storage that can be re-used by concurrent users without having to create a new BtClient instance. E.g. sharing a single storage among multiple user requests.

On Wed, 13 Oct 2021, 09:29 pyckle, @.***> wrote:

Sure 👍

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/atomashpolskiy/bt/issues/185#issuecomment-942006717, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA4TJBXFYBSG6SALSUFSZXTUGUYOFANCNFSM5CTTJWNQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

atomashpolskiy commented 2 years ago

Hooking a new client for a torrent that already exists in the shared runtime does not make sense to me at all.

On Wed, 13 Oct 2021, 16:22 Andrei Tomashpolskiy, @.***> wrote:

Another way would be to implement a storage that can be re-used by concurrent users without having to create a new BtClient instance. E.g. sharing a single storage among multiple user requests.

On Wed, 13 Oct 2021, 09:29 pyckle, @.***> wrote:

Sure 👍

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/atomashpolskiy/bt/issues/185#issuecomment-942006717, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA4TJBXFYBSG6SALSUFSZXTUGUYOFANCNFSM5CTTJWNQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

pyckle commented 2 years ago

Another way would be to implement a storage that can be re-used by concurrent users without having to create a new BtClient instance.

Perhaps I misread the code, but I don't think that would work well, unless the download has finished. If the download hasn't finished, both runtimes will try and write the same unfinished blocks because they have no idea which blocks the other has downloaded, and when to has them. If there's content poisoning/a peer happens to send a corrupted chunk of data, one runtime could ban a good peer because the other runtime wrote a bad piece. Also, one runtime, unaware that data has been corrupted could send out bad data.

atomashpolskiy commented 2 years ago

What you say only applies in the presence of multiple clients but my point is that there's usually no need to have more than one client for each torrent. The storage by itself is an interface that can be re-used for relaying and/or accessing the data downloaded by the first (and only) client without having to create a dedicated client. Maybe I don't understand the use case? E.g. if we're talking about a media streaming service that is "backed" exclusively by torrent swarms (i.e. a purely transient proxy that only relays data from peers to end users and does not have its' own storage capacity in any significant quantity), then it's going to be an operational model that is very different from the original protocol intentions and design so an efficient implementation will have to make quite a few changes to an off-the-shelf torrent client. Maybe it's possible to approximate such an implementation to some extent with a couple of crutches in the existing code but it would still be very inferior to one that is built from the ground up with the concept of efficient transient proxying in mind. On the other hand, if there is some reasonable storage capacity then this problem can be pretty much solved without having to change a single line of code in an off-the-shelf torrent client (be it Bt or any other implementation).

ср, 13 окт. 2021 г. в 22:50, pyckle @.***>:

Another way would be to implement a storage that can be re-used by concurrent users without having to create a new BtClient instance.

Perhaps I misread the code, but I don't think that would work well, unless the download has finished. If the download hasn't finished, both runtimes will try and write the same unfinished blocks because they have no idea which blocks the other has downloaded, and when to has them. If there's content poisoning/a peer happens to send a corrupted chunk of data, one runtime could ban a good peer because the other runtime wrote a bad piece. Also, one runtime, unaware that data has been corrupted could send out bad data.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/atomashpolskiy/bt/issues/185#issuecomment-942709793, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA4TJBXVEAIGGAKQFIVRDM3UGXWILANCNFSM5CTTJWNQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

atomashpolskiy commented 2 years ago

In other words, there are two main types of peers in Bittorrent:

Also there is a special case of partial seeds but the only difference with full seeds is in the subset of data that they explicitly indicate they have.

Does OP's mode of operation correspond to something of the above or is it something else, like a quasi-seed that indicates to have all of the data but actually proxyies all requests to actual seeds in the swarm?

On Thu, 14 Oct 2021, 01:14 Andrei Tomashpolskiy, @.***> wrote:

What you say only applies in the presence of multiple clients but my point is that there's usually no need to have more than one client for each torrent. The storage by itself is an interface that can be re-used for relaying and/or accessing the data downloaded by the first (and only) client without having to create a dedicated client. Maybe I don't understand the use case? E.g. if we're talking about a media streaming service that is "backed" exclusively by torrent swarms (i.e. a purely transient proxy that only relays data from peers to end users and does not have its' own storage capacity in any significant quantity), then it's going to be an operational model that is very different from the original protocol intentions and design so an efficient implementation will have to make quite a few changes to an off-the-shelf torrent client. Maybe it's possible to approximate such an implementation to some extent with a couple of crutches in the existing code but it would still be very inferior to one that is built from the ground up with the concept of efficient transient proxying in mind. On the other hand, if there is some reasonable storage capacity then this problem can be pretty much solved without having to change a single line of code in an off-the-shelf torrent client (be it Bt or any other implementation).

ср, 13 окт. 2021 г. в 22:50, pyckle @.***>:

Another way would be to implement a storage that can be re-used by concurrent users without having to create a new BtClient instance.

Perhaps I misread the code, but I don't think that would work well, unless the download has finished. If the download hasn't finished, both runtimes will try and write the same unfinished blocks because they have no idea which blocks the other has downloaded, and when to has them. If there's content poisoning/a peer happens to send a corrupted chunk of data, one runtime could ban a good peer because the other runtime wrote a bad piece. Also, one runtime, unaware that data has been corrupted could send out bad data.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/atomashpolskiy/bt/issues/185#issuecomment-942709793, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA4TJBXVEAIGGAKQFIVRDM3UGXWILANCNFSM5CTTJWNQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

yjs112233 commented 2 years ago

Another way would be to implement a storage that can be re-used by concurrent users without having to create a new BtClient instance.

Perhaps I misread the code, but I don't think that would work well, unless the download has finished. If the download hasn't finished, both runtimes will try and write the same unfinished blocks because they have no idea which blocks the other has downloaded, and when to has them. If there's content poisoning/a peer happens to send a corrupted chunk of data, one runtime could ban a good peer because the other runtime wrote a bad piece. Also, one runtime, unaware that data has been corrupted could send out bad data.

Perhaps my incomplete description misled you, I actually isolated the different BtRuntime downloads by setting different storage locations. So there is no block collision. "The same TorrentId does not support running simultaneously in the same BtRuntime". On this question, I first want to know if sharing buckets between different clients is feasible. If you want to use it as a business function outside of the BT protocol implementation, Reference counting is a good way to do this. If you want to be an extension of the BT protocol, I like the idea of using shared buckets. But really, it's also similar to reference counting. I later changed my mind because this would result in a loss of data independence and privacy, It's important to stay privateIt's important to stay private, and since the issue arose in a business scenario, I preferred to meet this requirement at the business level rather than modify or extend the BT protocol.