The training is really fast, but the inference speed is very slow. I read the document and wrote batch, multi-core, but it is still very slow. Is there any other way to optimize the inference speed? #205
The problem encountered is the same, occupying 100G of memory, 40 cores are turned on, and reasoning is performed on texts with a length of less than 5000 words, 2 entries/s
The problem encountered is the same, occupying 100G of memory, 40 cores are turned on, and reasoning is performed on texts with a length of less than 5000 words, 2 entries/s