Closed C00reNUT closed 2 years ago
There is no parameter called "nsample",I guess you want to output 1000 different generated texts,if you want to implement this functionality, you can do so by setting num_return_sequences.
out = nlp(input ,num_return_sequences=1000)
This approach actually puts the input text into 1000 batches, so you need to set the max_batch_size to be greater than or equal to 1000, and the practice of batch_size=1000 is too rare, time consuming and excessive memory usage.
When I use num_return_sequences I found a bug, you can update the eet to solve it, thank you very much for using.If you have other needs you can always talk to us.
Thank you, now the num_return_sequences works and I can generate multiple samples.
However I still don't understand this:
This approach actually puts the input text into 1000 batches, so you need to set the max_batch_size to be greater than or equal to 1000, and the practice of batch_size=1000 is too rare, time consuming and excessive memory usage.
I can do max_batch_size = 10 and num_return_sequences=50 and it it returns 50 results.
import torch
from eet import pipeline
max_batch_size = 10
data_type = torch.float16
input = "My name is Sarah and I live in London"
nlp = pipeline("text-generation", model = 'gpt2-medium', data_type = data_type, max_batch_size = max_batch_size)
out = nlp(input, num_return_sequences=50)
print(len(out))
print(out)
But when I use max_batch_size = 10 and num_return_sequences=500 it should return 500 examples, but the script crashes with error
import torch
from eet import pipeline
max_batch_size = 10
data_type = torch.float16
input = "My name is Sarah and I live in London"
nlp = pipeline("text-generation", model = 'gpt2-medium', data_type = data_type, max_batch_size = max_batch_size)
out = nlp(input, num_return_sequences=500)
print(len(out))
print(out)
RuntimeError Traceback (most recent call last)
Input In [2], in <cell line: 8>()
6 input = "My name is Sarah and I live in London"
7 nlp = pipeline("text-generation", model = 'gpt2-medium', data_type = data_type, max_batch_size = max_batch_size)
----> 8 out = nlp(input, num_return_sequences=500)
9 print(len(out))
10 print(out)
File /mnt/2287294e-32c7-437b-84bd-452a29548b1a/conda_env/EET/lib/python3.8/site-packages/eet/pipelines/text_generation.py:113, in TextGenerationPipeline.__call__(self, text_inputs, **kwargs)
112 def __call__(self, text_inputs, **kwargs):
--> 113 return super().__call__(text_inputs, **kwargs)
File /mnt/2287294e-32c7-437b-84bd-452a29548b1a/conda_env/EET/lib/python3.8/site-packages/eet/pipelines/base.py:427, in Pipeline.__call__(self, inputs, num_workers, batch_size, *args, **kwargs)
425 return self.iterate(inputs, preprocess_params, forward_params, postprocess_params)
426 else:
--> 427 return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
File /mnt/2287294e-32c7-437b-84bd-452a29548b1a/conda_env/EET/lib/python3.8/site-packages/eet/pipelines/base.py:434, in Pipeline.run_single(self, inputs, preprocess_params, forward_params, postprocess_params)
432 def run_single(self, inputs, preprocess_params, forward_params, postprocess_params):
433 model_inputs = self.preprocess(inputs, **preprocess_params)
--> 434 model_outputs = self.forward(model_inputs, **forward_params)
435 outputs = self.postprocess(model_outputs, **postprocess_params)
436 return outputs
File /mnt/2287294e-32c7-437b-84bd-452a29548b1a/conda_env/EET/lib/python3.8/site-packages/eet/pipelines/base.py:357, in Pipeline.forward(self, model_inputs, **forward_params)
352 def forward(self, model_inputs, **forward_params):
353 # with self.device_placement():
354 # inference_context = self.get_inference_context()
355 # with inference_context():
356 model_inputs = self._ensure_tensor_on_device(model_inputs, device=self.device)
--> 357 model_outputs = self._forward(model_inputs, **forward_params)
358 model_outputs = self._ensure_tensor_on_device(model_outputs, device=torch.device("cpu"))
360 return model_outputs
File /mnt/2287294e-32c7-437b-84bd-452a29548b1a/conda_env/EET/lib/python3.8/site-packages/eet/pipelines/text_generation.py:151, in TextGenerationPipeline._forward(self, model_inputs, **generate_kwargs)
149 in_b = input_ids.shape[0]
150 prompt_text = model_inputs.pop("prompt_text")
--> 151 generated_sequence = self.model.generate(input_ids=input_ids, **generate_kwargs) # BS x SL
152 out_b = generated_sequence.shape[0]
153 generated_sequence = generated_sequence.reshape(in_b, out_b // in_b, *generated_sequence.shape[1:])
File /mnt/2287294e-32c7-437b-84bd-452a29548b1a/conda_env/EET/lib/python3.8/site-packages/torch/autograd/grad_mode.py:27, in _DecoratorContextManager.__call__.<locals>.decorate_context(*args, **kwargs)
24 @functools.wraps(func)
25 def decorate_context(*args, **kwargs):
26 with self.clone():
---> 27 return func(*args, **kwargs)
File /mnt/2287294e-32c7-437b-84bd-452a29548b1a/conda_env/EET/lib/python3.8/site-packages/eet/pipelines/generation.py:346, in GenerationMixin_EET.generate(self, inputs, max_length, min_length, do_sample, early_stopping, num_beams, temperature, top_k, top_p, typical_p, repetition_penalty, bad_words_ids, force_words_ids, bos_token_id, pad_token_id, eos_token_id, length_penalty, no_repeat_ngram_size, encoder_no_repeat_ngram_size, num_return_sequences, max_time, max_new_tokens, decoder_start_token_id, use_cache, num_beam_groups, diversity_penalty, prefix_allowed_tokens_fn, logits_processor, stopping_criteria, constraints, output_attentions, output_hidden_states, output_scores, return_dict_in_generate, forced_bos_token_id, forced_eos_token_id, remove_invalid_values, synced_gpus, exponential_decay_length_penalty, **model_kwargs)
338 input_ids, model_kwargs = self._expand_inputs_for_generation(
339 input_ids,
340 expand_size=num_return_sequences,
341 is_encoder_decoder=self.config.is_encoder_decoder,
342 **model_kwargs,
343 )
345 # 12. run sample
--> 346 return self.sample(
347 input_ids,
348 logits_processor=logits_processor,
349 logits_warper=logits_warper,
350 stopping_criteria=stopping_criteria,
351 pad_token_id=pad_token_id,
352 eos_token_id=eos_token_id,
353 output_scores=output_scores,
354 return_dict_in_generate=return_dict_in_generate,
355 synced_gpus=synced_gpus,
356 **model_kwargs,
357 )
359 elif is_beam_gen_mode:
360 if num_return_sequences > num_beams:
File /mnt/2287294e-32c7-437b-84bd-452a29548b1a/conda_env/EET/lib/python3.8/site-packages/eet/pipelines/generation.py:789, in GenerationMixin_EET.sample(self, input_ids, logits_processor, stopping_criteria, logits_warper, max_length, pad_token_id, eos_token_id, output_attentions, output_hidden_states, output_scores, return_dict_in_generate, synced_gpus, **model_kwargs)
787 # sample
788 probs = nn.functional.softmax(next_token_scores, dim=-1)
--> 789 next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
791 # finished sentences should have their next token be a padding token
792 if eos_token_id is not None:
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Usually I can generate any amount of samples (in this case represented by num_return_sequences) as long as it is divisible by the max_batch_size. That means that I should be able to produce 50, 500 or 5000 samples with max_batch_size = 10, it will just take longer to generate.
Am I missing something? Is there some way to generate that many samples without running out of memory?
Ok, I looked at the code and now I get it, it is implemented somehow differently than I thought. Thank you for this nice library.
Hello, I am trying to use text-generation pipeline using docker image with these parameters:
after the execution I get the following output:
That means that instead of 1024 samples (passing 'nsamples':'1024' parameter) I am still getting just one output. Is there something I am missing here?