Training not converge on S3DIS

yoxu515 commented 3 years ago

Hi Ariou, thanks for sharing the code. I forked your code and trained on S3DIS.

However, the loss was not to converge and IoUs of many classes maintained 0.

Do you have any idea what is wrong? Thanks.

The train log was like this: Screenshot from 2020-10-11 22-22-15

lqzhao commented 3 years ago

Hi, I met the same problem, did you solve this? Thanks @yoxu515

JerryIndus commented 3 years ago

Hi Ariou, thanks for sharing the code. I forked your code and trained on S3DIS.

However, the loss was not to converge and IoUs of many classes maintained 0.

Do you have any idea what is wrong? Thanks.

The train log was like this:

Excuse me, I also met the same problem, did you solve it? I am looking forward to your reply. You can also contact me by e-mail (tongw_indus@126.com)

JerryIndus commented 3 years ago

Hi, I met the same problem, did you solve this? Thanks @yoxu515 Excuse me, I also met the same problem, did you solve it? I am looking forward to your reply. You can also contact me by e-mail (tongw_indus@126.com)

lqzhao commented 3 years ago

This repo needs much effort to reproduce the performance of the original paper.

JerryIndus commented 3 years ago

This repo needs much effort to reproduce the performance of the original paper.

Thank you for your reply. And I wonder did you already reproduce the performance of the original paper. I have noticed that some people pointed out some errors, like: "In file model.py class LocalSpatialEncoding function forward the line: features.expand(B, -1, N, K) the features from each of the K neighbors should be gathered" and the dist used in this project is squared distance. And I have already corrected them and now testing the performance. And I want to know, is there anything else that should be correct? For example... the data loader? loss function? or the model still has some other errors...

lqzhao commented 3 years ago

Sorry, I have not reproduced the performance yet and I'm not going to use this repo, since the current performance is too bad to be improved.

JerryIndus commented 3 years ago

Sorry, I have not reproduced the performance yet and I'm not going to use this repo, since the current performance is too bad to be improved.

Thank you for the reply. Have you tried any other RandlaNet-PyTorch projects? If there are other feasible projects, please recommend them to me. Thank you very much.

yoxu515 commented 3 years ago

Hi, @JerryIndus @lqzhao , I didn't pay much effort to fix this repo. I tried another repo of RandLANet with pytorch, and you can refer to this repo. But as far as I know, the results may also not be as good as the official Tensorflow version.

Once you maneged to get the results the same as those in the paper using this repo, please inform me. Thanks.

Thibaud-Ardoin commented 3 years ago

Hello @yoxu515, @lqzhao and @JerryIndus thanks for you interest in the repo,

First of all, an important fix has been implemented in the gather branch, however it has not been tested yet by lack of time and GPU resources.

As you might have noticed, we are not really focusing on improving the results of this repo. It was a student project and now we moved to something else, therefore we have unfortunately no more time to run experiments and fix some bugs.

Would you be interested to collaborate by adding your experiments results to this repos ? It might help others that are looking for answers.

Thanks for your contribution in any case. Best, Thibaud

JerryIndus commented 3 years ago

Hello @yoxu515, @lqzhao and @JerryIndus thanks for you interest in the repo,

First of all, an important fix has been implemented in the gather branch, however it has not been tested yet by lack of time and GPU resources.

As you might have noticed, we are not really focusing on improving the results of this repo. It was a student project and now we moved to something else, therefore we have unfortunately no more time to run experiments and fix some bugs.

Would you be interested to collaborate by adding your experiments results to this repos ? It might help others that are looking for answers.

Thanks for your contribution in any case. Best, Thibaud

I have a question, in model.py there is no random sample be used, but in the paper and the TensorFlow code, the author used the random sample method. Maybe I didn't notice, could you point out where did the random sample method be introduced in your code?

wangzihao77 commented 1 year ago

这个回购协议需要付出很多努力来重现原始纸张的性能。>> 谢谢你的回复。我想知道你是否已经重现了原始纸张的性能。我注意到有些人指出了一些错误，比如：在文件model.py中类LocalSpatialEncoding功能向前线路：features.expand（B，-1，N，K） 应该收集每个K邻居的特征”这个项目中使用的dist是平方距离。我已经纠正了它们，现在正在测试性能。我想知道，还有什么应该正确的吗？例如......数据加载器？损失函数？或者模型仍然有一些其他错误......

hello, can you tell me what you fixed the code? I want to update the model

aRI0U / RandLA-Net-pytorch

Training not converge on S3DIS #15