Xtra-Computing / FedTree

A tree-based federated learning system (MLSys 2023)
https://fedtree.readthedocs.io/en/latest/index.html
Apache License 2.0
140 stars 38 forks source link

python horizontal 模型训练自动killed问题 #51

Closed Amoto1103 closed 1 year ago

Amoto1103 commented 1 year ago

请问我使用python训练FLClassifier时,max_tree参数为10时可以跑通,为12时出现训练第一轮时自动killed的情况,请问这大概是什么原因呢?在ubuntu1804虚拟机运行的,内存4g,数据量为20w,特征数18。

QinbinLi commented 1 year ago

Hi @Amoto1103 ,

Can you provide the data and code so that I can reproduce the issue for debugging? Thanks!

Amoto1103 commented 1 year ago

Thanks for your reply. I have sent my data and code to your email (gmail address).

QinbinLi commented 1 year ago

Hi @Amoto1103 ,

Thanks for your sharing! I have tested your code and it can successfully run on my server. The process requires at least 8GB memory and the memory in your machine is not about to support running it. The task you want to conduct is multi-classification, which requires multiple trees to train in each round, and thus the memory consumption is high. We will consider how to reduce memory usage in the future. You may try another machine or reduce the tree depth currently.

Amoto1103 commented 1 year ago

Hi @Amoto1103 ,

Thanks for your sharing! I have tested your code and it can successfully run on my server. The process requires at least 8GB memory and the memory in your machine is not about to support running it. The task you want to conduct is multi-classification, which requires multiple trees to train in each round, and thus the memory consumption is high. We will consider how to reduce memory usage in the future. You may try another machine or reduce the tree depth currently.

Get it. Thanks again for your kind assistance!