ProjetPP / PPP-QuestionParsing-Grammatical

Question Parsing module for the PPP using a grammatical approch
GNU Affero General Public License v3.0
33 stars 11 forks source link

Cleaning 1 #130

Closed Ezibenroc closed 9 years ago

Ezibenroc commented 9 years ago

Review dependencyTree.py.

Main modifications:

→ The preprocessing is a true preprocessing (which depends on the original sentence).

Ezibenroc commented 9 years ago

This is all for this pull request. I only moved code and renamed variables and functions.

The main impacted files are preprocessingMerge.py (which is now preprocessing.py) and dependencyTree.py.

There might be further modifications of dependencyTree.py in a future pull request (e.g. removing these ugly codes 1000 in the Word indices).

Cleaning of the other files will be done in other pull requests.

yhamoudi commented 9 years ago

A class TreeGenerator to generate the tree from the Stanford output.

Nice!

NER merging moved into the class DependencyTree. Preposition merging moved into the class DependencyTree.

I am less convinced by this. The idea of the preprocessing merging is to have a lot of (very different) heuristics to merge some part of the tree. It's not very convenient/natural to code them into the class DependenciesTree:

I think that we should keep the preprocessing merged operations outside dependencyTree.py and put only the "basic" operations into this file. (we can rename preprocessingMerge.py into something else if you want to keep preprocessing for operations that are not on a tree. For instance: initialMerge.py).

...

No opinion on the programming style (but it seems to be much better :)

Ezibenroc commented 9 years ago

I am new to OOP, but it seems that the computations we are doing should be made in methods: these computations modify instances of DependencyTree and are dependant of the local data.

See http://stackoverflow.com/questions/8108688/in-python-when-should-i-use-a-function-instead-of-a-method

I don't think it is an issue to have a big file.

Ezibenroc commented 9 years ago

But I agree that a lot of our functions should be DependencyTree methods. If mergePreposition is a method, then normalize should also be a method.

yhamoudi commented 9 years ago

I don't think it is an issue to have a big file. normalize should also be a method.

If you follow these rules, the files dependencyAnalysis.py, normalization.py and half of questionWordProcessing.py will become methods of DependenciesTree (ie: almost all the code in 1 file). I don't know if it is how people do, but i really don't want to code in this way :cry: Please, keep 1 file = 1 big part of the algo.

Tpt commented 9 years ago

One of the best practices of OOP is the "single responsibility principle" [1] that states that each class should have only one responsibility.

So, applied to your use case, the DependenciesTree responsibility is, I think, to represent the tree and each algorithm that works on it (dependancy analyzis, normalization) should be in its own class that all manipulates DependenciesTree instances.

If you want to read more on OOP best practices, this Wikipedia article is a good entry point: https://en.wikipedia.org/wiki/SOLID_%28object-oriented_design%29

[1] https://en.wikipedia.org/wiki/Single_responsibility_principle

Ezibenroc commented 9 years ago

@Tpt: thank you for the advice. I will try to do this.

Ezibenroc commented 9 years ago

We would have piece of code like this (where tree is an instance of DependencyTree, tree.wordList is a list of instances of Word, tree.wordList[0].word and preposition are strings):

tree.wordList[0].word += ' ' + preposition

I find this ugly... Don't you?

I don't like the fact that some external function will modify deeply our objects. A minor modifications of the data structure could result in modification in all the codes (this has already happened).

Ezibenroc commented 9 years ago

Now we have a file initialMerge.py with two classes NamedEntityMerging and PrepositionMerging.

yhamoudi commented 9 years ago

it's ok for me

Ezibenroc commented 9 years ago

@ProgVal: could you please tell me if the code style seems ok in initialMerge.py, dependencyTree.py and preprocessing.py?

You don't need to go into details.

progval commented 9 years ago

mergeSisterBrother -> mergeSibling?

progval commented 9 years ago

"\t\"{0}\" -> \"{1}\"[label=\"{2}\"];\n" -> '\t"{0}" -> "{1}"[label="{2}"];\n' (same for the other strings in dependencyTree.py)

progval commented 9 years ago

besides that, it looks good. (with a very local point of view)

Ezibenroc commented 9 years ago

Ok, thank you.