Closed bilgeacun closed 11 months ago
@lzcemma We're trying to get a comparison with Dejavu for a paper submission and have limited time. We would greatly appreciate if you could take a look at this issue soon.
I just updated the repo. Can you check if it works for you?
@lzcemma thanks for the fix. I am getting a different error related to shape mismatch now:
Compute generate token step < 0 >.
Compute generate token step < 1 >.
IndexError: The shape of the mask [2039] at index 0 does not match the shape of the indexed tensor [1, 2048] at index 0
Traceback (most recent call last):
File "/private/home/acun/DejaVu/Decentralized_FM_alpha/dist_inference_runner.py", line 111, in <module>
main()
File "/private/home/acun/DejaVu/Decentralized_FM_alpha/dist_inference_runner.py", line 84, in main
distributed_inference_mask_iter(args, pipe, device, request_processor)
File "/private/home/acun/DejaVu/Decentralized_FM_alpha/utils/dist_inference_utils.py", line 73, in distributed_inference_mask_iter
current_iter_time = pipeline.inference_batch(input_ids, output_ids_list, attention_mask=attention_mask)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/private/home/acun/DejaVu/Decentralized_FM_alpha/pipeline_parallel/dist_pipeline_inference_mask_greedy_token_pipe_sync.py", line 830, in inference_batch
self.forward_new_token_pipeline_stage(attention_mask=attention_mask)
File "/private/home/acun/DejaVu/Decentralized_FM_alpha/pipeline_parallel/dist_pipeline_inference_mask_greedy_token_pipe_sync.py", line 711, in forward_new_token_pipeline_stage
self.forward_new_token_pipeline_step(step, attention_mask=attention_mask)
File "/private/home/acun/DejaVu/Decentralized_FM_alpha/pipeline_parallel/dist_pipeline_inference_mask_greedy_token_pipe_sync.py", line 756, in forward_new_token_pipeline_step
self._forward_compute_generate_token(i, mask=attention_masks[i])
File "/private/home/acun/DejaVu/Decentralized_FM_alpha/pipeline_parallel/dist_pipeline_inference_mask_greedy_token_pipe_sync.py", line 567, in _forward_compute_generate_token
current_emb, cache = self.layers["block" + str(layer_index)](
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/private/home/acun/.conda/envs/dejavu/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/private/home/acun/DejaVu/Decentralized_FM_alpha/modules/hf_opt_module_save.py", line 477, in forward
_hidden_states = hidden_states.view(-1, hidden_states.size(-1))[
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: The shape of the mask [2039] at index 0 does not match the shape of the indexed tensor [1, 2048] at index 0
I guess the generation length of c4 data is not set to 0. I updated the getdata.py to reflect that.
Thanks, it seems to work now, run_infer_opt_175b_collect_sp_data.sh got completed and I get mmap outputs.
Ran into the below error when collecting training data with
run_infer_opt_175b_collect_sp_data.sh
script:Last thing that's printed out is:
Here is the error:
Any idea what could be causing this?