Closed saanikat closed 4 weeks ago
Solution: Check the number of samples in the file. If the file has less than equal to 10 values, then average all the embeddings and use that averaged embedding as the value embedding for the column. If it has more than 10 samples, then do clustering.
Solved with #15.
Added function get_averaged
in utils.py
For metadata with less than 10 samples, it will average all the embeddings. For metadata with more than 10 samples, it will cluster all the value embeddings, get the largest cluster, and average that.
As the model uses clustering for generating the embedding for the values, it throws an error when there is only one sample in the PEP.