BethanyG / NANA

4 stars 0 forks source link

Pattern Library has no Python3 Support & Needs to be Replaced #15

Open BethanyG opened 7 years ago

BethanyG commented 7 years ago

See notes in the following repo: https://github.com/clips/pattern

Because pattern has no python3 implementation that works (the pattern3 on pipy installs but doesn't run), IngredientAnalyzer.py will have to be rewritten to use another parse method (maybe NLTK or textBlob) if we are going to port the code to python 3.

BethanyG commented 7 years ago

Was able to correct naming issues to successfully import and (try!) to use the Pattern3 port. HOWEVER, there are numerous issues in the library itself, and I have been unsuccessful in fixing them all. Let me know if you'd like copies of the "fixed" libs - a few fixes were trivial, a few are above my head, and I'd love some help with them if you're willing.

BethanyG commented 7 years ago

It also appears that there has been activity on the GitHub repo, so there might be some advantage to trying to build manually from the HEAD of that project.....

BethanyG commented 7 years ago

Putting this here for reference as to the next strategy on replacing Pattern:

recipe ingredients tagging with CRF

These two are also really interesting: Training an NLTK Chunker Named Entity Extraction

Final 3 SpaCy Tutorial - Really Interesting! - SpaCy uses word vectors, which is the 'new sexy' for NLP. Gensim Gensim also uses word vectors - and it's author repaired Pattern3 for his project...... DLA

dangillet commented 7 years ago

I'm exploring the possibility to use sklearn-crfsuite. I've pushed a branch named crfsuite (712f019c).

At the moment there are 2 useful functions. One is train_model() and is used to re-train the model. The second one is parse_ingredient(ingredient) and returns a Tree with some chunking to better find the different components of the ingredient.

I think it's a good thing to return the Tree because it's not the responsibility of this function to know what will be useful to find the ingredient in the DB. This should be the job of other functions which will receive the Tree as an argument.

I could see this broken down to those steps:

dangillet commented 7 years ago

I've worked the whole day on replacing Pattern3. Commit 69711 is my first pass on making something usable. The only problem is that the output cannot be properly displayed as described in issue #6 or issue #18. Not sure what the problem is exactly.

Maybe this should be the next course of action in order to have something to see. :)

BethanyG commented 7 years ago

Found the issue with #6, and it has been fixed and closed. See commit 9536424

The JSON returned had the "ingredients" list nested under the "analysis_summary" section - but the template and parse code for display was referring to recipeDetails.ingredients -- not _recipeDetails.analysissummary.ingredients This was causing the JinJa2 template and the JS code to blow up and not render any of the JSON sent to the front end. Obviously, we'll need to eventually re-write and correct the JSON from the back end - but for the sake of expediency, I've hacked the JinJa2 template.

RecipeMaker.py was also returning invalid JSON - there was a final "}" missing from the doc, which was causing the whole thing to fail at the front end. I've corrected the make_json() method (see commit 3b69035)-- but here again, we should examine a more robust approach (and unit tests!!) to the whole thing, since the document is getting to large to troubleshoot in a very effective manner.

Pulling a fresh copy of the branch should get you up and running (at least for viewing the display - pathetic as it is).

Found JSON Linter very helpful in troubleshooting.....

dangillet commented 7 years ago

Great work! It works for me. :)

As you've opened new issues for the JSON problems, shall we close this issue? Also should we then merge this back to master?