Niger-Volta-LTI / iranlowo

Ìrànlọ́wọ́ is a utility library for analysis & (pre)processing of Yorùbá text → https://pypi.org/project/iranlowo
MIT License
17 stars 8 forks source link

Corpus module #13

Closed Olamyy closed 5 years ago

Olamyy commented 5 years ago

Introduces a number of changes:

  1. Restructured the file structure by:

    1. Moving some functions from `adr.py` to `utils.py`. 
    These function seem more like generally reusable utilities than actual adr utilities. 
    
    2. Moving another set of functions from `adr.py` to `preprocessing.py`.
     These function seem more like generally reusable preprocessing functions than actual adr functions. 
  2. Converted existing tests to unittest. Major win here is that the tests are now in reusable OOP format.

  3. Introduces some of the enhancements in #12 :

    1. corpus.py : Introduces the Corpus and DirectoryCorpus classes to help users load corpus into iranlowo.
    2. loaders.py : Introduces a bunch of classes for loading text already available in https://github.com/Niger-Volta-LTI/yoruba-text into the Corpus/DirectoryCorpus classes. The currently available loaders are: BBC, Bibeli, Owe, Yoruba Blog. [note: I'm not exactly proud of my implementation of the OweLoader so I won't mind someone taking another look at it before merging.]
Olamyy commented 5 years ago

Failing tests. I would check this and fix them as soon as I can.

ruohoruotsi commented 5 years ago

Hi @Olamyy ! Thank you very much for your PR and overall contributions!! These look like useful additions to the library! Overall, I think it might be more collaborative to discuss, design & review proposed changes before we get to the PR phase (where we're just battling Travis issues), especially to ensure that we are synchronized on goals & to get opinions like from David, Timi and others. What do you think of this approach?

Regarding the build failure, it is perhaps because you need to add an __init__.py at the top level of the tests directory, otherwise, without it, tests.utils is not a package like you are using it as such. See: https://docs.pytest.org/en/latest/goodpractices.html .

Screen Shot 2019-07-18 at 12 42 07 PM
Olamyy commented 5 years ago

Yeah. That definitely seems like a better and more organized approach to working on features. Which of github or slack do you think would be a good medium for this?

About the failing tests, I came about this same link while trying to figure out what the issue was. I just haven't looked at it in details yet. I'll get to it as soon as I can.

Thanks.

ruohoruotsi commented 5 years ago

Great! Let me make a Project board and some milestones and you can add some cards, so basically github. This way the work can stay localized to the project, even if we have a higher reliance on github.

Maybe I'll take a fork of your fork and see if the __init__.py solution will work.

ruohoruotsi commented 5 years ago

The new project is here: https://github.com/Niger-Volta-LTI/iranlowo/projects