Closed Kohhh24 closed 2 months ago
Thank you for bringing this to my attention. Sorry for any inconvenience this discrepancy may have caused. I've received the log file for the 'ACE/none/10class/5shot/shuffle_r1/perm0' experiment and will carefully review it. I guess it might be caused by the hyperparameters. Given the current workload, I will check this before next Monday. Before that, can you provide detailed information on your GPU and CUDA Version? As I found that the GPU used and the cuda version may influence the experimental results greatly. Thank you for your understanding. Best regards!
Thanks for your prompt reply! Below is the information regarding my PyTorch, GPU, and CUDA versions.
pytorch 2.2.1 py3.8_cuda12.1_cudnn8.9.2_0
NVIDIA GeForce RTX 4090 24GB
CUDA Version: 12.2
And I've noticed another issue in the dataloader.py file within the utils directory: the (--my_test) argument has not been provided.
if args.my_test: return MAVEN_Dataset(data_tokens[:100], data_labels[:100], data_masks[:100], data_spans[:100]) # TODO:test use else: return MAVEN_Dataset(data_tokens, data_labels, data_masks, data_spans)
The comment in the code suggests # TODO:test use. I used
return MAVEN_Dataset(data_tokens, data_labels, data_masks, data_spans)
for the three functions in the dataloader.py file, and I'm not sure if this is the reason why the F1 score has decreased.How should I use this parameter?
The comment in the code suggests # TODO:test use.
It is not used for the test process, I actually use this for debugging.
Never use --my_test
if you are not debugging, otherwise only 100 instances are used!
You can turn this off and evaluate again to see the results.
Besides, I will upload the full settings I used, you can check the updates in the last few days and compare the settings.
Best regards!
Thank you! I'm looking forward to learning more about your full setting!
Hi @Kohhh24, I've uploaded the necessary settings. If you have any other questions, feel free to contact us!
I'm extremely grateful !
Besides, if you find our work helpful, consider giving it a star!
I apologize for any inconvenience. I have executed the code using the hyperparameters as specified in the paper and the accompanying code. However, the accuracy results I am obtaining consistently fall short of the benchmarks provided in the paper. For instance, the microF1 score during the second stage of the 2-way 5-shot evaluation on the ACE dataset is lower by 10-20 percentage points compared to the values reported in the paper. Could there be a reason for this discrepancy? Is it possible that some code or hyperparameters have not been updated or are not in sync with the latest findings? I have uploaded the log file for the 'ACE/none/10class/5shot/shuffle_r1/perm0' experiment in the attachment. Could you please review it? I would appreciate your insights on any potential issues or discrepancies that might be affecting the accuracy of the results. 2024-06-27-22-17-20.log