About accuracy of the result

chenlong-clock / CFED-HANet

Code and dataset for lrec-coling 2024 paper: "Continual Few-shot Event Detection via Hierarchical Augmentation Networks"

5 stars 0 forks source link

About accuracy of the result #2

Closed Kohhh24 closed 2 months ago

Kohhh24 commented 2 months ago

I apologize for any inconvenience. I have executed the code using the hyperparameters as specified in the paper and the accompanying code. However, the accuracy results I am obtaining consistently fall short of the benchmarks provided in the paper. For instance, the microF1 score during the second stage of the 2-way 5-shot evaluation on the ACE dataset is lower by 10-20 percentage points compared to the values reported in the paper. Could there be a reason for this discrepancy? Is it possible that some code or hyperparameters have not been updated or are not in sync with the latest findings? I have uploaded the log file for the 'ACE/none/10class/5shot/shuffle_r1/perm0' experiment in the attachment. Could you please review it? I would appreciate your insights on any potential issues or discrepancies that might be affecting the accuracy of the results. 2024-06-27-22-17-20.log

chenlong-clock commented 2 months ago

Thank you for bringing this to my attention. Sorry for any inconvenience this discrepancy may have caused. I've received the log file for the 'ACE/none/10class/5shot/shuffle_r1/perm0' experiment and will carefully review it. I guess it might be caused by the hyperparameters. Given the current workload, I will check this before next Monday. Before that, can you provide detailed information on your GPU and CUDA Version? As I found that the GPU used and the cuda version may influence the experimental results greatly. Thank you for your understanding. Best regards!

Kohhh24 commented 2 months ago

Thanks for your prompt reply! Below is the information regarding my PyTorch, GPU, and CUDA versions. pytorch 2.2.1 py3.8_cuda12.1_cudnn8.9.2_0
NVIDIA GeForce RTX 4090 24GB CUDA Version: 12.2

Kohhh24 commented 2 months ago

And I've noticed another issue in the dataloader.py file within the utils directory: the (--my_test) argument has not been provided.

if args.my_test: return MAVEN_Dataset(data_tokens[:100], data_labels[:100], data_masks[:100], data_spans[:100]) # TODO:test use else: return MAVEN_Dataset(data_tokens, data_labels, data_masks, data_spans)

The comment in the code suggests # TODO:test use. I used

return MAVEN_Dataset(data_tokens, data_labels, data_masks, data_spans)

for the three functions in the dataloader.py file, and I'm not sure if this is the reason why the F1 score has decreased.How should I use this parameter?

chenlong-clock commented 2 months ago

The comment in the code suggests # TODO:test use.

It is not used for the test process, I actually use this for debugging. Never use --my_test if you are not debugging, otherwise only 100 instances are used! You can turn this off and evaluate again to see the results. Besides, I will upload the full settings I used, you can check the updates in the last few days and compare the settings. Best regards!

Kohhh24 commented 2 months ago

Thank you! I'm looking forward to learning more about your full setting!

chenlong-clock commented 2 months ago

Hi @Kohhh24, I've uploaded the necessary settings. If you have any other questions, feel free to contact us!

Kohhh24 commented 2 months ago

I'm extremely grateful !

chenlong-clock commented 2 months ago

Besides, if you find our work helpful, consider giving it a star!