Niger-Volta-LTI / iranlowo

Ìrànlọ́wọ́ is a utility library for analysis & (pre)processing of Yorùbá text → https://pypi.org/project/iranlowo
MIT License
17 stars 8 forks source link

Documentation #7

Closed Olamyy closed 5 years ago

Olamyy commented 5 years ago

ReadTheDocs has issues with very large requirements. So, I created a new branch documentation and removed torch from the requirements.txt for that branch. So, that branch will be used to build the documentation. We should try not to merge with master since master has torch as a compulsory requirement.

Olamyy commented 5 years ago

Oh. Travis CI seems to be breaking cause of torch.

Olamyy commented 5 years ago

A possible fix to this is to create a new branch, documentation and then make travis only run CI for the master branch.

ruohoruotsi commented 5 years ago

Thank you Olamilekan for the Pull Request! I don't know much about ReadTheDocs and I have a few questions:

  1. I'm curious about what ReadTheDocs brings to have the documentation hosted away from the code versus having the documentation co-located w/ the documentation (w/ READMEs)
  2. Is https://iranlowo.readthedocs.io/en/latest/ just a single page? Is there more work to do to flesh out the ideal documentation. I also noticed it has "Sponsored Ads" on the page, which is a definite negative. I rather not people reading docs be targeted w/ Adverts. Reading the README on github doesn't have adverts.
  3. Github has wiki pages for documentation. (https://help.github.com/en/articles/about-wikis) I don't know much about them since I haven't written documentation except in READMEs, but how does Github Wiki compare to ReadTheDocs
  4. Ìrànlọ́wọ́ uses a Continuous Integration build system (git push with a tag triggers Travis CI which builds the sdist (pypi package), runs tests and then if all passes, uploads it to pypi) This is my first pypi project and it took me a long time to get the Travis stuff working cleanly. Even the size of the model I had to ask for extra space from pypi. Just today, I made some changes as uploading to pypi from a private repo during dev (using travis.com) is different from uploading to pypi from a public repo (using travis.org) https://travis-ci.org/Niger-Volta-LTI/iranlowo/builds. Because of these, I'm hesitant to merge any PR that breaks the build. How do we resolve the build issues? Is the only way to have documentation as a separate branch? If so, is that good to have the documentation on one branch and the code on another? and is that better than keeping code & documentation together?

Okay, I've asked many questions. I'm mostly concerned about not breaking the build, which is tied to the Continuous Integration builds. I want correct, better documentation and I appreciate your efforts, so can we enumerate how ReadTheDocs improves on things and figure out a good pattern to support it or another better documentation system harmoniously with the code?

I'm also attached Timi & David as reviewers, so we can get their opinion/comments please. Adupe!

Olamyy commented 5 years ago

1 and 3. ReadTheDocs uses sphinx which really is easier to use and has way more features than markdown. Part of what it does better than the other options is :

  1. The current documentation at https://iranlowo.readthedocs.io/en/latest/ is the default theme of ReadTheDocs. The theme can be edited to meet any requirements we have. The ads bit though, I can't say much about that.

  2. The CI bit is a tricky part. Given how important it is, it's safe to say we can let go of readthedocs for now since I haven't found a way to go around torch and readthedocs.

Would go ahead to close this PR for now.

ruohoruotsi commented 5 years ago

Thank you for the explanation and healthy engineering debate. I have learned a lot.

For 1&3:

Regarding 2: I wonder if there is a small money, paid plan for the docs that removes the ads. 🤔

Finally for 3: the size of the model and the large torch dependency is definitely a limiting factor. When @Timilehin built the UI for the ADR webserver, he used a CPU only (small version) of torch since Heroku does not like big files, either. So this is a recurring problem/limitation.

So maybe if we can resolve the requirements.txt to continue to be consistent with demands of Travis Continuous integration while not using the BIG model-training GPU version of torch (the default) but rather the smaller CPU-only one, maybe we can make the master branch code more usable for everyone. 🤔

On a related note, I'm also not happy with a 88MB model living in the model folder. I have all the models living on this artifactory: https://bintray.com/ruohoruotsi/prebuilt-models/adr-models maybe there is some clever way (or post install script) that can download them locally ...this is another issue to create/investigate. The upside is that the download is fast/small and then if you can separately pull down the models to do inference/prediction. I'll create an issue and we can discuss options there. Cheers.