After experimenting with the project some more, I've found a few points of potential improvements. I'll aggregate them here if that's okay:
[ ] Set device=None as the default in distill, rather than "cpu". This should use Sentence Transformers' functionality to get the "strongest" available device by default (e.g. "cuda" first). P.s.: distilling mxbai-embed-large-v1 takes 00:05 seconds on my GPU vs 01:58 on CPU. (Edit: Might be moot since #35)
[x] Don't be afraid to link to yourself in the README.md, i.e. in the bottom of the model card template you could put a hyperlink over your names to your GitHub or something. I've included this in #37.
[ ] In my experience, the private argument in push_to_hub is quite well liked, it allows people to look the model (card) over before making the model public to everyone.
[ ] The token argument in push_to_hub should be optional. If it's not specified, then the Hugging Face tools will automatically load it from your local filesystem if you ever did huggingface-cli login.
[ ] "Model Authors" in the model card is a bit of a misnomer: if someone else trained the model then they're the authors, you're the Model2Vec authors.
Hello (again)!
After experimenting with the project some more, I've found a few points of potential improvements. I'll aggregate them here if that's okay:
device=None
as the default indistill
, rather than"cpu"
. This should use Sentence Transformers' functionality to get the "strongest" available device by default (e.g. "cuda" first). P.s.: distillingmxbai-embed-large-v1
takes 00:05 seconds on my GPU vs 01:58 on CPU. (Edit: Might be moot since #35)trust_remote_code
argument. This is required for loading models with custom code, such as https://huggingface.co/jinaai/jina-embeddings-v2-base-en or https://huggingface.co/nomic-ai/nomic-embed-text-v1.5. (Edit: Might be moot since #35)private
argument inpush_to_hub
is quite well liked, it allows people to look the model (card) over before making the model public to everyone.token
argument inpush_to_hub
should be optional. If it's not specified, then the Hugging Face tools will automatically load it from your local filesystem if you ever didhuggingface-cli login
.