Closed shafqatvirk closed 7 months ago
The WordTransformer enables batch processing (12 at a time). We currently do not use this because we pass one sentence at a time to the WordTransformer. We have to change this bit in xl-lexeme:
def compute_embeddings_lexeme(sentence_and_token_index: list[tuple], model) -> np.ndarray:
"""
This function computes embeddings for given sentences and token indices.
:param sentence_and_token_index: A list of tuples, each containing a sentence and a corresponding token index.
:type sentence_and_token_index: list[tuple]
:param model: The model that will be used to encode the given sentences.
:return: Embeddings for the given sentences and token indices.
:rtype: np.ndarray
"""
token_embeddings_output = list()
for i, (sen, idx) in enumerate(sentence_and_token_index):
#print(type(idx))
idx_tuple = (int(idx.split(':')[0]),int(idx.split(':')[1]))
#idx_tuple = ast.literal_eval(idx.split(':'))
examples = InputExample(texts='"' + sen + '"', positions=[idx_tuple[0], idx_tuple[1]])
outputs = model.encode(examples)
token_embeddings_output.append(outputs)
# print(sen,idx,outputs)
# print(token_embeddings_output)
token_embeddings_output = np.array(token_embeddings_output)
return token_embeddings_output`
I have pushed changes that use the batch processing functionality in the WordTransformer. A batch_size parameter can now be set in the settings file. e2745023baa0b5a350490053da99e2201a7014b8
We have to implement some sort of batch processing of the computational annotator so that the instances can be annotated at large scale.