代码复现 - Githubissues

ultramarine-indigo commented 8 months ago

您好，我正在尝试复现ALMT代码，但是在MOSI、MOSEI和SIMS这三个数据集上的结果都比论文中低，特别是SIMS数据集，实验的超参数都是按照论文附录中提供的设置，其余都为默认值，请问是在训练的时候有什么trick吗？

Haoyu-ha commented 8 months ago

尝试下不同的随机数种子

ultramarine-indigo commented 8 months ago

尝试过不同的随机种子，都没有达到论文里面的结果

Haoyu-ha commented 8 months ago

我用发布的代码重新跑了一下你说的偏差很大的SIMS的数据集（时间原因这里只选择了一个最难的Acc-5指标复现，我所使用的seed为38），能达到论文所写结果。我已经增加了测试代码和测试用的训练参数文件，你可以git最新代码后加载测试看看。还是需要强调一下，不同的指标达到最佳的性能的随机数种子可能是不一样的。并且由于环境区别，最佳化的参数也会有所区别，结果自然会显示有波动。

Zhan9YC commented 8 months ago

Hello, I am trying to reproduce the ALMT code, but the results on the three datasets of MOSI, MOSEI and SIMS are lower than in the paper, especially the SIMS dataset, the hyperparameters of the experiment are all set according to the appendix provided in the paper, and the rest are the default values, is there any trick during training?

What is the accuracy of the MOSI and MOSEI datasets that you reproduce?

ultramarine-indigo commented 8 months ago

According to the published code, I selected MAE as the metric and chose the minimum MAE achieved within 200 epochs as the best_result. The experimental results I replicated are as follows for MOSI, MOSEI, and SIMS datasets, all using the unaligned dataset, with the seed set to 38.

Haoyu-ha commented 8 months ago

According to the published code, I selected MAE as the metric and chose the minimum MAE achieved within 200 epochs as the best_result. The experimental results I replicated are as follows for MOSI, MOSEI, and SIMS datasets, all using the unaligned dataset, with the seed set to 38.

I roughly understand it. In fact, in our experiments, we conducted multiple experiments for each metric separately to observe the model's best performance. For instance, For MAE and Corr, we took the smallest MAE and the corresponding Corr as the optimal MAE and Corr. For Acc-2 and F1, we took the highest Acc-2 and the corresponding F1 as the model's optimal binary classification accuracy and optimal F1, rather than taking the Acc-2/F1 at the epoch of smallest MAE. For Acc-3/Acc-5/Acc-7, we also directly chose the maximum values as the model's best performance instead of selecting the values corresponding to the epoch of smallest MAE.

Haoyu-ha commented 8 months ago

As mentioned above, for example, if you want to get the optimal performance of Acc-2 and F1, you should record the value of Acc-2 when it is at its maximum and the corresponding F1, respectively.

Haoyu-ha commented 8 months ago

You can try to record the best performance based on Acc-2/F1, Acc-7, Acc-5, Acc-3, MAE/Corr metrics, respectively, rather than based on minimised MAE.

ultramarine-indigo commented 8 months ago

I understand, That means the metrics Acc-2/F1, Acc-7, Acc-5, and Acc-3 in the paper are obtained at different epochs and with different seeds rather than in a single epoch？

Haoyu-ha commented 8 months ago

I understand, That means the metrics Acc-2/F1, Acc-7, Acc-5, and Acc-3 in the paper are obtained at different epochs and with different seeds rather than in a single epoch？

Yes. You should also have observed that the optimal value of each metric in 200 epochs is not always in the same epoch.

ultramarine-indigo commented 8 months ago

Alright, thank you very much for your patient reply.

1941611146qq commented 8 months ago

I understand, That means the metrics Acc-2/F1, Acc-7, Acc-5, and Acc-3 in the paper are obtained at different epochs and with different seeds rather than in a single epoch？

Yes. You should also have observed that the optimal value of each metric in 200 epochs is not always in the same epoch.

请问那参数中的"seq_lens": [50, 50, 50]在SIMS数据集下不需要改吗？

Haoyu-ha commented 8 months ago

I understand, That means the metrics Acc-2/F1, Acc-7, Acc-5, and Acc-3 in the paper are obtained at different epochs and with different seeds rather than in a single epoch？

Yes. You should also have observed that the optimal value of each metric in 200 epochs is not always in the same epoch.

请问那参数中的"seq_lens": [50, 50, 50]在SIMS数据集下不需要改吗？

It's up to you. You can modify this hyperparameter and adjust the model input and output dimensions to compress the complete sequence.

Zhan9YC commented 7 months ago

I followed your operation and the original code, why did the following situation occur, will it appear in the process of your experiment?

Zhan9YC commented 7 months ago

I followed your operation and the original code, why did the following situation occur, will it appear in the process of your experiment?

just 57% acc

Haoyu-ha commented 7 months ago

I followed your operation and the original code, why did the following situation occur, will it appear in the process of your experiment?

I have not come across this problem. If you haven't changed the code, I suggest checking the environment.

Zhan9YC commented 7 months ago

I followed your operation and the original code, why did the following situation occur, will it appear in the process of your experiment?

I have not come across this problem. If you haven't changed the code, I suggest checking the environment.

I have not changed the code and I also have a question, is it normal for me to predict a result using only the text modality to even reach 86% on the F1 coefficient?

Haoyu-ha commented 7 months ago

I followed your operation and the original code, why did the following situation occur, will it appear in the process of your experiment?

I have not come across this problem. If you haven't changed the code, I suggest checking the environment.

I have not changed the code and I also have a question, is it normal for me to predict a result using only the text modality to even reach 86% on the F1 coefficient?

Due to the categories unbalance in the dataset, it is possible to have a situation where Acc-2 is low and F1 is relatively high. I would suggest that you observe both Acc-2 and F1 at the same time.

DavidSmith-FHTT commented 6 months ago

我用发布的代码重新跑了一下你说的偏差很大的SIMS的数据集（时间原因这里只选择了一个最难的Acc-5指标复现，我所使用的seed为38），能达到论文所写结果。我已经增加了测试代码和测试用的训练参数文件，你可以git最新代码后加载测试看看。还是需要强调一下，不同的指标达到最佳的性能的随机数种子可能是不一样的。并且由于环境区别，最佳化的参数也会有所区别，结果自然会显示有波动。

您好，我根据所说的设置seed为38，在SIMS数据集上进行了复现。我发现，Acc-5的结果仍旧与论文中的结果相差甚远，请问这是什么原因呢？

DavidSmith-FHTT commented 6 months ago

我用发布的代码重新跑了一下你说的偏差很大的SIMS的数据集（时间原因这里只选择了一个最难的Acc-5指标复现，我所使用的seed为38），能达到论文所写结果。我已经增加了测试代码和测试用的训练参数文件，你可以git最新代码后加载测试看看。还是需要强调一下，不同的指标达到最佳的性能的随机数种子可能是不一样的。并且由于环境区别，最佳化的参数也会有所区别，结果自然会显示有波动。

您好，我根据所说的设置seed为38，在SIMS数据集上进行了复现。我发现，Acc-5的结果仍旧与论文中的结果相差甚远，请问这是什么原因呢？

Best result:{'Mult_acc_2': 0.7768052516411379, 'Mult_acc_3': 0.6389496717724289, 'Mult_acc_5': 0.4135667396061269, 'F1_score': 0.7743720621851709, 'MAE': 0.41242543, 'Corr': 0.5924711699992573}

Korvas-hb commented 6 months ago

HI,I have a question about metric. Which of these metrics are in one group? I only know that MAE and COR are a group

Haoyu-ha commented 6 months ago

HI,I have a question about metric. Which of these metrics are in one group? I only know that MAE and COR are a group

Thank you for your question. You are correct that MAE and Corr in one group. In addition, Acc-2 and F1 are also in one group.

Korvas-hb commented 6 months ago

HI,I have a question about metric. Which of these metrics are in one group? I only know that MAE and COR are a group

Thank you for your question. You are correct that MAE and Corr in one group. In addition, Acc-2 and F1 are also in one group.

When choosing a model, I choose the model with the best set of metric (E.g. this model works best on COR and ACC, so I choose him) or choose the same method as the author

Haoyu-ha commented 6 months ago

HI,I have a question about metric. Which of these metrics are in one group? I only know that MAE and COR are a group

Thank you for your question. You are correct that MAE and Corr in one group. In addition, Acc-2 and F1 are also in one group.

When choosing a model, I choose the model with the best set of metric (E.g. this model works best on COR and ACC, so I choose him) or choose the same method as the author

Hi. I think it is up to you, as long as it reasonably shows the best performance of your model.

Korvas-hb commented 6 months ago

HI,I have a question about metric. Which of these metrics are in one group? I only know that MAE and COR are a group

Thank you for your question. You are correct that MAE and Corr in one group. In addition, Acc-2 and F1 are also in one group.

When choosing a model, I choose the model with the best set of metric (E.g. this model works best on COR and ACC, so I choose him) or choose the same method as the author

Hi. I think it is up to you, as long as it reasonably shows the best performance of your model.

okay. thanks for your patient reply.

Katyawa commented 2 months ago

作者您好，请问随机数种子具体该如何尝试呀，手动遍历尝试吗？

Haoyu-ha / ALMT

代码复现 #6