databio / bedms

tool for standardization of genomics/epigenomics metadata
BSD 2-Clause "Simplified" License
3 stars 0 forks source link

Error when the user provided input only has one sample #18

Closed saanikat closed 4 weeks ago

saanikat commented 1 month ago

As the model uses clustering for generating the embedding for the values, it throws an error when there is only one sample in the PEP.

saanikat commented 1 month ago

Solution: Check the number of samples in the file. If the file has less than equal to 10 values, then average all the embeddings and use that averaged embedding as the value embedding for the column. If it has more than 10 samples, then do clustering.

saanikat commented 4 weeks ago

Solved with #15. Added function get_averaged in utils.py For metadata with less than 10 samples, it will average all the embeddings. For metadata with more than 10 samples, it will cluster all the value embeddings, get the largest cluster, and average that.