facebookresearch / MetaICL

An original implementation of "MetaICL Learning to Learn In Context" by Sewon Min, Mike Lewis, Luke Zettlemoyer and Hannaneh Hajishirzi
Other
250 stars 35 forks source link

Inference time #24

Closed xipq closed 12 months ago

xipq commented 1 year ago

Hi, I would like to know how long would the inference testing on a single task (e.g. metaicl or channel-metaicl on non_qa_to_qa) last. I followed the provided command but experiments on a single config lasted for hours on a V100. I would like to know whether this is abnormal. Thanks.

shmsw25 commented 1 year ago

Hi @xipq, thank you for interest. It depends on hyperparameters such as batch size. could you share your full hyperparameters and the screenshot of when you run the command (which would display the time spent as well). I'll look into it and see if it is expected.

xipq commented 1 year ago

Hi, thanks for your reply. The command I've used was:

method="channel-metaicl"
task="hr_to_lr"
out_dir="checkpoints/${method}/${task}"
checkpoint="checkpoints/${method}/${task}/model-30000.pt"
seed=100,13,21,42,87
bs=16
CUDA_VISIBLE_DEVICES=0 python test.py \
  --task $task --k 16 --split test --seed $seed \
  --use_demonstrations \
  --test_batch_size $bs \
  --method channel \
  --checkpoint $checkpoint \
  --out_dir $out_dir

The test ended as:

08/04/2023 15:21:42 - INFO - __main__ - checkpoints/channel-metaicl/hr_to_lr/tweet_eval-stance_feminist-test-channel-k=16-s=87.pkl
08/04/2023 15:21:42 - INFO - __main__ - torch.Size([201, 1024])
08/04/2023 15:22:22 - INFO - __main__ - Accuracy=0.4399466933200067
08/04/2023 15:22:22 - INFO - __main__ - Macro-F1 of hr_to_lr over 26 target tasks: 44.7

Sorry that I haven't kept the full console logs during evaluation, so I would show the ls -lrt of .pkl and .txt files generated and their timestamps, as follows:

-rw-r--r-- 1       5012 Aug  4 03:37 'quarel-test-channel-k=16-s=100.pkl'
-rw-r--r-- 1       3729 Aug  4 03:37 'quarel-test-channel-k=16-s=100.txt'
-rw-r--r-- 1      12241 Aug  4 03:41 'financial_phrasebank-test-channel-k=16-s=100.pkl'
-rw-r--r-- 1       3770 Aug  4 03:41 'financial_phrasebank-test-channel-k=16-s=100.txt'
-rw-r--r-- 1      18010 Aug  4 03:52 'openbookqa-test-channel-k=16-s=100.pkl'
-rw-r--r-- 1       9936 Aug  4 03:52 'openbookqa-test-channel-k=16-s=100.txt'
-rw-r--r-- 1      20028 Aug  4 04:00 'codah-test-channel-k=16-s=100.pkl'
-rw-r--r-- 1      17444 Aug  4 04:00 'codah-test-channel-k=16-s=100.txt'
-rw-r--r-- 1      66694 Aug  4 04:25 'qasc-test-channel-k=16-s=100.pkl'
-rw-r--r-- 1      10961 Aug  4 04:25 'qasc-test-channel-k=16-s=100.txt'
-rw-r--r-- 1       7352 Aug  4 04:28 'glue-mrpc-test-channel-k=16-s=100.pkl'
-rw-r--r-- 1       5228 Aug  4 04:28 'glue-mrpc-test-channel-k=16-s=100.txt'
-rw-r--r-- 1      55100 Aug  4 04:49 'dream-test-channel-k=16-s=100.pkl'
-rw-r--r-- 1      47166 Aug  4 04:49 'dream-test-channel-k=16-s=100.txt'
-rw-r--r-- 1      13375 Aug  4 04:54 'sick-test-channel-k=16-s=100.pkl'
-rw-r--r-- 1       6222 Aug  4 04:54 'sick-test-channel-k=16-s=100.txt'
-rw-r--r-- 1      54965 Aug  4 05:14 'commonsense_qa-test-channel-k=16-s=100.pkl'
-rw-r--r-- 1      13127 Aug  4 05:14 'commonsense_qa-test-channel-k=16-s=100.txt'
-rw-r--r-- 1      10990 Aug  4 05:18 'medical_questions_pairs-test-channel-k=16-s=100.pkl'
-rw-r--r-- 1       5645 Aug  4 05:18 'medical_questions_pairs-test-channel-k=16-s=100.txt'
-rw-r--r-- 1       6920 Aug  4 05:21 'quartz-with_knowledge-test-channel-k=16-s=100.pkl'
-rw-r--r-- 1       3894 Aug  4 05:21 'quartz-with_knowledge-test-channel-k=16-s=100.txt'
-rw-r--r-- 1       2843 Aug  4 05:22 'poem_sentiment-test-channel-k=16-s=100.pkl'
-rw-r--r-- 1       1015 Aug  4 05:22 'poem_sentiment-test-channel-k=16-s=100.txt'
-rw-r--r-- 1       6920 Aug  4 05:25 'quartz-no_knowledge-test-channel-k=16-s=100.pkl'
-rw-r--r-- 1       3896 Aug  4 05:25 'quartz-no_knowledge-test-channel-k=16-s=100.txt'
...
-rw-r--r-- 1       6920 Aug  4 14:46 'quartz-no_knowledge-test-channel-k=16-s=87.pkl'
-rw-r--r-- 1       3927 Aug  4 14:46 'quartz-no_knowledge-test-channel-k=16-s=87.txt'
-rw-r--r-- 1       1286 Aug  4 14:47 'glue-wnli-test-channel-k=16-s=87.pkl'
-rw-r--r-- 1        869 Aug  4 14:47 'glue-wnli-test-channel-k=16-s=87.txt'
-rw-r--r-- 1      11062 Aug  4 14:51 'climate_fever-test-channel-k=16-s=87.pkl'
-rw-r--r-- 1       3438 Aug  4 14:51 'climate_fever-test-channel-k=16-s=87.txt'
-rw-r--r-- 1       1574 Aug  4 14:51 'ethos-national_origin-test-channel-k=16-s=87.pkl'
-rw-r--r-- 1        503 Aug  4 14:51 'ethos-national_origin-test-channel-k=16-s=87.txt'
-rw-r--r-- 1       1574 Aug  4 14:52 'ethos-race-test-channel-k=16-s=87.pkl'
-rw-r--r-- 1        501 Aug  4 14:52 'ethos-race-test-channel-k=16-s=87.txt'
-rw-r--r-- 1       1574 Aug  4 14:52 'ethos-religion-test-channel-k=16-s=87.pkl'
-rw-r--r-- 1        502 Aug  4 14:52 'ethos-religion-test-channel-k=16-s=87.txt'
-rw-r--r-- 1      10756 Aug  4 14:57 'ai2_arc-test-channel-k=16-s=87.pkl'
-rw-r--r-- 1       9153 Aug  4 14:57 'ai2_arc-test-channel-k=16-s=87.txt'
-rw-r--r-- 1      38554 Aug  4 15:11 'hate_speech18-test-channel-k=16-s=87.pkl'
-rw-r--r-- 1      13505 Aug  4 15:11 'hate_speech18-test-channel-k=16-s=87.txt'
-rw-r--r-- 1       4994 Aug  4 15:13 'glue-rte-test-channel-k=16-s=87.pkl'
-rw-r--r-- 1       3623 Aug  4 15:13 'glue-rte-test-channel-k=16-s=87.txt'
-rw-r--r-- 1       1520 Aug  4 15:13 'superglue-cb-test-channel-k=16-s=87.pkl'
-rw-r--r-- 1        598 Aug  4 15:13 'superglue-cb-test-channel-k=16-s=87.txt'
-rw-r--r-- 1       3716 Aug  4 15:14 'superglue-copa-test-channel-k=16-s=87.txt'
-rw-r--r-- 1       1808 Aug  4 15:14 'superglue-copa-test-channel-k=16-s=87.pkl'
-rw-r--r-- 1      17992 Aug  4 15:21 'tweet_eval-hate-test-channel-k=16-s=87.pkl'
-rw-r--r-- 1       7119 Aug  4 15:21 'tweet_eval-hate-test-channel-k=16-s=87.txt'
-rw-r--r-- 1        340 Aug  4 15:21 'tweet_eval-stance_atheism-test-channel-k=16-s=87.txt'
-rw-r--r-- 1       1412 Aug  4 15:21 'tweet_eval-stance_atheism-test-channel-k=16-s=87.pkl'
-rw-r--r-- 1       1817 Aug  4 15:22 'tweet_eval-stance_feminist-test-channel-k=16-s=87.pkl'
-rw-r--r-- 1        435 Aug  4 15:22 'tweet_eval-stance_feminist-test-channel-k=16-s=87.txt'

Thanks in advance!

shmsw25 commented 12 months ago

Hi, I'm sorry for the late reply, for some reason I saw it now.

Hmm, based on the timstamps of the files created, it does look to me that the time being spent is reasonable. The only thing it seems like is there is a big gap between quartz-no_knowledge-test-channel-k=16-s=100 and quartz-no_knowledge-test-channel-k=16-s=87 which might be due to external reasons? And definitely some datasets are larger and takes more time, e.g., the dream dataset. In that case, it might be OK to just exclude the data (for the preliminary experiments at least). Also if you have multiple GPUs and you would like to parallelize experiments for speed-up, you can specify the dataset names as arguments and run them in parallel.