amazon-science / supervised-intent-clustering

This is a package to fine-tune language models in order to create clustering-friendly embeddings.
Other
2 stars 0 forks source link

Test setup #2

Open sindhura97 opened 7 months ago

sindhura97 commented 7 months ago

Hi, For inference, do you get embeddings of all texts for all intents in the test set and cluster them? Or do you construct batches like in the training phase?

Anton87 commented 6 months ago

Hi @sindhura97 , this depends on the size of your problem, i.e how many utterances you need to cluster. If their corresponding embedding fit in memory, you can cluster all the utterances together during inference, otherwise you need to split by batch.

However, for inference it does not make a big difference between the two approaches. :-)

sindhura97 commented 6 months ago

Hi, my intention to ask this question was to know about the evaluation setup. Were the evaluation metrics obtained for one final clustering per dataset or are the metrics computed for each batch and averaged?