Open akram opened 2 years ago
Hey @akram, thanks for the great question. File size is always important to me!
Allow me to elaborate a bit on this, so people can better understand the challenges and provide proposals.
Note: Everything that follows is only valid for the prebuild packages of NormCap! (The Python package is by itself really tiny and the (large) dependencies can be shared with other applications, if installed as Python package)
Be careful to not compare apples with oranges. If you want to compare NormCap, please compare it with tools which share the following features:
Every feature above makes the package larger. 1-3 are my developers choices and not to be discussed. 4 exists mostly for historic reasons and can be easily changed.
Here's an excerpt of the unpacked .dmg
package (for Linux and Windows it's similar):
Path | Size | Comment |
---|---|---|
/Contents/Resources/app/normcap/ressources/tessdata | 73 MB | 5 frequently spoken languages (+ German ;-)) |
/Contents/Resources/app_packages/PySide2 | 48 MB | Qt5 Crossplatform GUI Framework |
/Contents/MacOS | 12 MB | NormCap Binary (Python interpreter?) |
/Contents/Resources/app_packages/lib* | 5 MB | Tesseract OCR + dependencies |
/\<everything else > | 11 MB | Various other dependencies, program code, etc.. |
Total | 150 MB |
Let's see where we have options to tweak:
Here we have not much room for improvement, I guess. The OCR Framework is crucial and just needs some dependencies. The binary is the Python Interpreter and necessary to run NormCap.
I already spent quite some effort on stripping away unnecessary dependencies here (see #114 and below). There still might be some room for improvement, but it's probably minor. Suggestions are welcome! :-)
Here is obviously the easiest way to have a large impact on the package files size. The languages got included in an earlier version, long before the user had the possibility to add additional languages by herself.
Today, it is possible to add languages on demand, but IMHO it is still not super trivial and it has to be done (if needed).
This means, we need to trade-off: Better out-of-the-box user experience vs. file size.
Once upon a time I decided to balance in the direction of the user experience by adding 6 language files which should cover what most people want. That is also more than most people need, but I considered it worth the size. But that's up to discussion, and I would love to hear your feedback on alternatives:
n
most frequent languages (status quo)What do you think? Do you have other ideas?
Thanks @dynobo for the very detailed and justified explanations.
To make a reasonnable and effective decision, I would say that it is required to know how users use the application. On my side, it is a really sporadic use, specifically when people sends me screenshots of bank account number instead of a clean PDF. As this was happening quite often I need a solution like normcap to capture that by avoiding me tedious read and paste and also to be quite reliablble. So, for users like me, multi language support is not a must have, even no language support is actually sufficient.
There could be for sure other usages, like journalists/writers where people need to capture text from screenshot, espcially press communiqué to report this as readable text.
I would tend to say that single language support + on demand would be a good trade off, but again, it should be required to understand usage.
I am bit concerned with file size these days because in some situations I am lacking an unlimited bandwith data plan, and have to fallback on a paid data plan where every download counts. That forces fair use. And in some countries where this is the default, it is always convenient to have a lightweight solution.
I am sure we will find the best solution...and may that will participate to the almost altmodisch "Green-IT" concept.
Linking #238 here, which led to a significant reduction, especially for macOS and Linux packages.
Hi team and thank you for your work,
that would be great if you could reduce the size of the dmg. 105MB is quite big actually compared to similar solutions that are between 1MB et 20MB. Maybe the difference is that OCR happens locally with normcap? and some other apps uses web queries to do the job? That would be good also to explain why normcap is 100MB+
thanks