Open yl4579 opened 11 months ago
hi there,
All models are 7B. The error might be the consistency of GPUs.
*GPU Issue for LTU-AS: We find that Open-AI whisper features are different on different GPUs, which impacts the performance of LTU-AS as it takes the Whisper feature as input. In the paper, we always use features generated by old GPUs (Titan-X). But we do release a checkpoint that uses a feature generated by newer GPUs (A5000/A6000), please manually switch the checkpoint if you are running on old/new GPUs (by default this code uses a new GPU feature). A mismatch of training and inference GPU does not completely destroy the model, but would cause a performance drop.
A good way to test is check if the output is consistent with our online API.
-Yuan
The online API is not working right now. If it’s different though, since I’m running inference on A40, how do I get it working in the same way as the API?
You can manually download the model to your path https://github.com/YuanGongND/ltu#pretrained-models, we provide 4 checkpoints.
And then change https://github.com/YuanGongND/ltu/blob/1963db6943bc409e42287bf5b4e6977982999fe2/src/ltu_as/inference_gradio.py#L52
-Yuan
there might be some other reasons, e.g., the sampling rate need to be 16kHz.
I just checked the output and I'm pretty sure the default model produces output very similar to 13B (Beta)
in the huggingface space (though down now). How do I get the 7B (Default)
results?
please upload a sample wav and question. I will check later.
Our MIT GPUs are currently down, will check with our IT.
There are three lora checkpoints, have you tried them all? https://github.com/YuanGongND/ltu#pretrained-models
Also, I restarted the HF space. Can you check if it is consistent with your local model? I am using the same checkpoint ("Long_sequence_exclude_noqa_new_gpu (Default)") as the default checkpoint online.
Now I have confirmed they give similar response, but the response is different from those I got a month ago (around early Nov). Did you change the model for your huggingface space?
I do not remember clearly, but we did switch the checkpoint. You can try the "Original in Paper" checkpoint under LTU-AS, https://github.com/YuanGongND/ltu#pretrained-models.
It is an easy switch, just download the checkpoint and change https://github.com/YuanGongND/ltu/blob/1963db6943bc409e42287bf5b4e6977982999fe2/src/ltu_as/inference_gradio.py#L52 to point it to the new checkpoint.
In your experience which one is better? I changed to eval_mdl_path = '../../pretrained_mdls/ltu_ori_paper.bin'
but got the following error:
RuntimeError Traceback (most recent call last)
Cell In[3], line 50
47 temp, top_p, top_k = 0.1, 0.95, 500
49 state_dict = torch.load(eval_mdl_path, map_location='cpu')
---> 50 miss, unexpect = model.load_state_dict(state_dict, strict=False)
52 model.is_parallelizable = True
53 model.model_parallel = True
File ~/.conda/envs/venv_ltu_as/lib/python3.10/site-packages/torch/nn/modules/module.py:1671, in Module.load_state_dict(self, state_dict, strict)
1666 error_msgs.insert(
1667 0, 'Missing key(s) in state_dict: {}. '.format(
1668 ', '.join('"{}"'.format(k) for k in missing_keys)))
1670 if len(error_msgs) > 0:
-> 1671 raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
1672 self.__class__.__name__, "\n\t".join(error_msgs)))
1673 return _IncompatibleKeys(missing_keys, unexpected_keys)
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
size mismatch for base_model.model.model.audio_proj.1.weight: copying a param with shape torch.Size([4096, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1280]).
did you download the one under LTU or LTU-AS?
It is hard to say which is better, it depends on the task.
btw, you can ask multiple questions to the model in one time, but I guess the model performance will be better if you ask one by one. You can tune the prompt for each task, e.g., you can say "give an answer anyways" to force the model to give an answer rather say "I don't know".
Are models downloaded from
inference.sh
7B (Default) or 13B (Beta)? I found the latter quite error prone and not stable, which is similar to what I'm observing now locally. I think the model is 13B (Beta)? If so, how do I get the 7B (Default) model instead?