Closed ScottishFold007 closed 1 year ago
I think you don't want to use the --asym
argument because it creates a separate model for queries and documents, which performs much worse than the default tieing of the weights of query and document encoders. I haven't tested the --asym
kwarg with GradCache - I think the problem is that the model for --asym
is not a single torch model but two models (i.e. a dict) for query & doc model. Anyways since --asym
performs badly, I just wouldn't use it.
asym
But I'm thinking, if the query is not long, but the passage is very long, this does not use asym this form, can guarantee a good search effect? Will there be inaccuracies? Also, the effect becomes worse, is it because the length of the passages in the training set is generally not very long.
But I'm thinking, if the query is not long, but the passage is very long, this does not use asym this form, can guarantee a good search effect?
The way SGPT distinguishes them is via --specb
, which adds different brackets depending on if it's query or document. I had the same intuition as you, but it seems that --asym
just does not work well. You can still try it with the code as is but you will need to remove --gradcache --chunksize 4
(which will need more memory) or just make it compatible.
The way SGPT distinguishes them is via
--specb
, which adds different brackets depending on if it's query or document. I had the same intuition as you, but it seems that--asym
just does not work well. You can still try it with the code as is but you will need to remove--gradcache --chunksize 4
(which will need more memory) or just make it compatible. Well, there is one more question, is it also necessary to use """if "gpt" in model_name for a model with an architecture like bloom: accelerator = Accelerator()"" instead of ""other:Need to run e.g. bert-large-uncased (also works for GPT, but uses unnecessary memory)
ddp_kwargs = DistributedDataParallelKwargs(find_unused_parameters=True) accelerator = Accelerator(kwargs_handlers=[ddp_kwargs])""
If it doesn't error out then I think it's fine - you can probably use less memory if you change it to if ("gpt" in model_name) or ("bloom" in model_name):
If it doesn't error out then I think it's fine - you can probably use less memory if you change it to
if ("gpt" in model_name) or ("bloom" in model_name):
Okay, I understand. Thank you very much for your attentive answer! I wish you a happy life! All the best!
If it doesn't error out then I think it's fine - you can probably use less memory if you change it to
if ("gpt" in model_name) or ("bloom" in model_name):
Okay, I understand. Thank you very much for your attentive answer! I wish you a happy life! All the best!
Happy to be of help! 👍
But I'm thinking, if the query is not long, but the passage is very long, this does not use asym this form, can guarantee a good search effect?
The way SGPT distinguishes them is via
--specb
, which adds different brackets depending on if it's query or document. I had the same intuition as you, but it seems that--asym
just does not work well. You can still try it with the code as is but you will need to remove--gradcache --chunksize 4
(which will need more memory) or just make it compatible.
When I removed --gradcache --chunksize 4, the original code had a memory OOM, but when I added it, it worked even with chunksize set to 8.
Yeah you need gradcache for running with high batch sizes
Here is the command line I used:
Then the following error was reported:
But of course, when I set asym to False, it works perfectly, I don't know what the problem is? Can you help me out? Thank you!