JSv4 / Python-Redlines

Docx tracked change redlines for the Python ecosystem.
MIT License
28 stars 7 forks source link

Fix binary extraction flow #7

Closed rishabh-sagar-20 closed 2 months ago

rishabh-sagar-20 commented 2 months ago

Refactor:

Update:

JSv4 commented 2 months ago

Hey @rishabh-sagar-20, running into some issues getting this packaged on PyPi due to the large binary size of the build redlining engines. I have some thoughts on how to fix this, but still working on it. If you have thoughts, definitely welcome any suggestions. Max file size we can put on PyPi is 100 MB. I'm thinking we don't distribute the binaries and instead build them on the client OR host them on AWS or something and have an install step to pull the binaries after install - like how Spacy and other NLP libraries often download large binaries.

rishabh-sagar-20 commented 2 months ago

It's feasible. Additionally, we can create distinct builds for each platform. By doing so, you can circumvent size constraints and upload it to PyPi.

rishabh-sagar-20 commented 2 months ago

What's the overall size? I came across a page stating that we can discuss with PyPi to waive the size limit. Ref:

  1. https://pypi.org/help/#file-size-limit
  2. StackOverflow
JSv4 commented 2 months ago

Yes, I did see that, but it seems the limit is 60 MB (compressed) and the binaries (at least on my machine) are about 65 - 80 MB each.

rishabh-sagar-20 commented 2 months ago

I think the method you mentioned earlier, which is similar to Spacy models, should be effective. For hosting, GitHub Releases should work fine. We can update it later if we encounter any issues.