NaturalNode / natural

general natural language facilities for node
MIT License
10.57k stars 862 forks source link

Interested in CYK and Earley chart parsers? #183

Open Hugo-ter-Doest opened 10 years ago

Hugo-ter-Doest commented 10 years ago

I implemented CYK and Earley chart parsers in Node. Interested in including it in the natural package? Please let me know.

Regards, Hugo

kkoch986 commented 10 years ago

Hugo,

I've been really interested in parsers lately (working on something like an LR(1) parser myself right now), I'd love to check them out and we can talk about the best way to integrate them into natural.

-Ken

Hugo-ter-Doest commented 10 years ago

Hi Ken,

I created a repository with the chart parsers: https://github.com/Hugo-ter-Doest/chart_parsers The parsers are in /routes. I built a web app around it for testing and demonstration purposes. I think we/I can strip the parsers down to the algorithms and integrate it into natural. Also test code and documentation needs to be added.

Regards, Hugo

kkoch986 commented 10 years ago

Hugo,

Sorry for the delay on this, it looks pretty solid. I think if you could isolate the parsers and stand some tests up around them we could definitely merge them in.

-Ken

Hugo-ter-Doest commented 9 years ago

Here is a sign of life... I did a lot of work on the parsers:

Right now I'm working on tests using the Jasmin framework. At first I started writing tests using assert, but I saw that natural uses Jasmin.

Maybe we should look at the interfacing between existing modules in the natural module and the parsers. At the moment the parsers accept an a tagged sentence of the form: [['I', 'NP'], ['saw', 'V'], ['the', 'DET'], ['man', 'N'], ['with', 'P'], ['the', 'DET'], ['telescope', 'N']]

Best regards, Hugo

kkoch986 commented 9 years ago

Hugo,

Looks really exciting! I'll take a closer look tonight if i can. To be honest i'll have to quickly look through how our parsers work but it sounds like the tagged sentence makes sense.

-Ken

kkoch986 commented 9 years ago

Hugo,

Sorry for the delay on this, its still on my list just been traveling for work the past week so i've been a little distracted. Going to pull everything down now, look at our parsers and see what makes sense.

-Ken

Hugo-ter-Doest commented 9 years ago

Tests are in the spec folder. ChartParser_spec will fail because I'm using it to test the Head-Corner parser which is still under development. If you switch these lines: //[LeftCornerParser, EarleyParser].forEach(function(ChartParser) { [HeadCornerParser].forEach(function(ChartParser) {

the Left-Corner and Earley parsers will be tested.

Regards, Hugo

kkoch986 commented 9 years ago

@Hugo-ter-Doest, looking over some of the stuff today. I'm trying to think of what the best fit with our existing code will be.

It seems to me if the parser requires the sentence to be tagged, for now one would have to use the wordnet module since we don't have a working POS tagger currently, it would be cool to have an example of that in the docs.

Something seems wrong in the unit tests, i tried changing a few values in an effort to break them and they didn't break. I'll work on them a bit and see what I can come up with.

Hugo-ter-Doest commented 9 years ago

I will take a look at the Wordnet module to see how I can connect it to the parsers.

Regarding the unit tests I cannot see what is going on. In my environment they fail if I change expected values in the spec files. For instance, if you exchange the indices of parse_trees in lines 71/72 of ChartParser_spec.js, the test will/should fail. If you let me know what happens I will look into this.

Regards, Hugo

kkoch986 commented 9 years ago

Ok i'll take a look, i was just running through it quickly maybe i missed something. Also, re: wordnet, I dont think it needs to be integrated with the parsers but maybe an example of taking a string tokenizing it, looking it up in wordnet and producing a list of tagged tokens to pass to the parser. I think it'll be a useful example for people who want to use the parsers in the future.

Thanks!

On Thu, Oct 23, 2014 at 4:07 PM, Hugo ter Doest notifications@github.com wrote:

I will take a look at the Wordnet module to see how I can connect it to the parsers.

Regarding the unit tests I cannot see what is going on. In my environment they fail if I change expected values in the spec files. For instance, if you exchange the indices of parse_trees in lines 71/72 of ChartParser_spec.js, the test will/should fail. If you let me know what happens I will look into this.

Regards, Hugo

— Reply to this email directly or view it on GitHub https://github.com/NaturalNode/natural/issues/183#issuecomment-60300602.

Hugo-ter-Doest commented 9 years ago

Regarding Wordnet, I will create an example.

Unit tests: it's good idea to pull the latest code because I'm working on it on a daily basis.

Hugo

2014-10-23 22:09 GMT+02:00 Ken Koch notifications@github.com:

Ok i'll take a look, i was just running through it quickly maybe i missed something. Also, re: wordnet, I dont think it needs to be integrated with the parsers but maybe an example of taking a string tokenizing it, looking it up in wordnet and producing a list of tagged tokens to pass to the parser. I think it'll be a useful example for people who want to use the parsers in the future.

Thanks!

On Thu, Oct 23, 2014 at 4:07 PM, Hugo ter Doest notifications@github.com

wrote:

I will take a look at the Wordnet module to see how I can connect it to the parsers.

Regarding the unit tests I cannot see what is going on. In my environment they fail if I change expected values in the spec files. For instance, if you exchange the indices of parse_trees in lines 71/72 of ChartParser_spec.js, the test will/should fail. If you let me know what happens I will look into this.

Regards, Hugo

— Reply to this email directly or view it on GitHub https://github.com/NaturalNode/natural/issues/183#issuecomment-60300602.

— Reply to this email directly or view it on GitHub https://github.com/NaturalNode/natural/issues/183#issuecomment-60300925.

Met vriendelijke groeten, Hugo ter Doest 06-23 43 60 33

kkoch986 commented 9 years ago

Yea i think it was my mistake on the unit tests sorry about that was just playing with them and everything seems fine.

Would you be able to outline for me just the files we would need to include the parsers in natural as well as what new external dependencies we would have to add? That way i know exactly what I need to do to integrate them.

Thanks!

Hugo-ter-Doest commented 9 years ago

External dependencies are: "lodash" : "", "fs" : "", "log4js": "", "typeof": ""

plus jasmine-node for testing.

Files you need are: from lib: Agenda.js ChartParser.js CYKParser.js EarleyItem.js GrammarParser.js LeftCornerParser.js Chart.js CYK_Item.js DoubleDottedItem.js EarleyParser.js GoalItem.js HeadCornerParser.js PEG-grammar-for-unification-grammar.txt

And from spec: ChartParser_spec.js Chart_spec.js CYKParser_spec.js EarleyItem_spec.js GrammarParser_spec.js HeadCornerParser_spec.js

Hugo

Hugo-ter-Doest commented 9 years ago

I forgot to mention the data files for the unit tests. From data you need: math_expressions.txt test_grammar_for_CYK.txt minimal_grammar.txt test_grammar_for_CFG.txt

Hugo-ter-Doest commented 9 years ago

Did some work on the example with Wordnet. I found out that Wordnet supports the following POS tags: n NOUN v VERB a ADJECTIVE s ADJECTIVE SATELLITE r ADVERB

which is quite limited for full parsing. I will think of an example that makes some sense. In the mean time you can check the example. It is in the example folder (where else :-) of chart_parsers.

Also, I made the parsers work with a tagged sentence that may have multiple tags per token, like this: [["I","s","n"],["saw","v","n"],["the","unknown"],["man","v","n"],["with","unknown"],["the","unknown"],["telescope","v","n"]]

Hugo

kkoch986 commented 9 years ago

Ok cool, was just looking for some end to end kind of example but i agree wordnet may not be the best. I think a better method for POS tagging is definitely a high priority for me as soon as i get some time to work on it.

Hugo-ter-Doest commented 9 years ago

I did something else to complement the Wordnet tags: I wrote a module FunctionWordTagger that reads a bunch of files with function words and tags the rest of the sentence. The module is in the lib folder, the dictionary files are in the data folder.

Hugo

kkoch986 commented 9 years ago

Hugo,

Sorry again for the slowness on this been a crazy few weeks at work, can you just let me know if there are any lodash features you use that arent included in underscore? I don't really want to have both dependencies since theyre pretty interchangable.

I think eventually we could port natural over to lodash but for now i just want to find the quickest path to getting the parsers integrated.

I just created a branch for this so its at the top of my list hopefully by the end of the weekend i'll have made some serious progress.

-Ken

Hugo-ter-Doest commented 9 years ago

I will take a look at the underscore features. If I remember right, I use lodash for deep comparison of objects only. I checked and this is supported by underscore as well. So it's probably no problem to exchange them.

Hugo

Hugo-ter-Doest commented 9 years ago

Oh and there's no hurry. I'm writing this stuff just for fun and to learn a new language and libraries.

Hugo-ter-Doest commented 9 years ago

I replaced lodash with underscore!

Hugo

kkoch986 commented 9 years ago

Perfect! i was playing around with it and replaced it in a few places as well so i had a feeling it would have worked.

silentrob commented 9 years ago

This looks amazing. We are still missing a CCG or minimalist grammar. But that is a great list.

Hugo-ter-Doest commented 9 years ago

Thanks! Plan is to integrate this into the natural module.

Regards, Hugo

Hugo-ter-Doest commented 9 years ago

I published the chart parsers on npm: npm install chart-parsers

Regards, Hugo