elazarg / nakdimon

Hebrew Diacritizer
https://nakdimon.org
MIT License
27 stars 6 forks source link

Remove model from git #27

Open thewh1teagle opened 2 weeks ago

thewh1teagle commented 2 weeks ago

It takes a lot of time to do git clone with this repository since the model is uploaded here as well. I suggest remove it and upload it to the release https://github.com/elazarg/nakdimon/releases/tag/v0.1.2

Then you can simply instruct the users to execute wget with the model path before running.

elazarg commented 2 weeks ago

I understand the issue, but I think it shoud be simlper than wget (not all users have wget installed). Also, how can the model be fetched automatically upon pip install?

thewh1teagle commented 2 weeks ago

I understand the issue, but I think it shoud be simlper than wget

wget is installed by default in most systems. Windows (powershell), Linux and macOS. The thing is that git does not fetch efficiently large files, while wget does.

Also, how can the model be fetched automatically upon pip install

I don't recommend doing that too since it's error prune. most ML libraries today simply instruct to fetch the model before executing the script.

elazarg commented 2 weeks ago

The imporant part is to make sure the user will find it easy to fetch to the correct path, even if they are not very well-versed in Python. If you know an idiomatic way to do that, one that won't be a hurdle to the user (e.g. lazily fetching upon first execution), then I would really appreciate a PR. Otherwise I will try to implement a better solution when I'll get to it.

elazarg commented 2 weeks ago

Simple wget from PowerShell does not create the file.

thewh1teagle commented 1 week ago

Simple wget from PowerShell does not create the file.

Turns out that in Windows the -O flag is required and then the same command works in macOS/Linux/Windows.

wget https://github.com/elazarg/nakdimon/blob/master/nakdimon/Nakdimon.h5 -O Nakdimon.h5

In addition wget in Windows can be installed with

winget install -e --id JernejSimoncic.Wget

It's faster and more reliable than PowerShell's wget handler. Use wget.exe to use it from PowerShell then.