Closed Ezibenroc closed 9 years ago
This is all for this pull request. I only moved code and renamed variables and functions.
The main impacted files are preprocessingMerge.py
(which is now preprocessing.py
) and dependencyTree.py
.
There might be further modifications of dependencyTree.py
in a future pull request (e.g. removing these ugly codes 1000
in the Word
indices).
Cleaning of the other files will be done in other pull requests.
A class TreeGenerator to generate the tree from the Stanford output.
Nice!
NER merging moved into the class DependencyTree. Preposition merging moved into the class DependencyTree.
I am less convinced by this. The idea of the preprocessing merging is to have a lot of (very different) heuristics to merge some part of the tree. It's not very convenient/natural to code them into the class DependenciesTree
:
dependencyTree.py
is still huge)DependenciesTree
that it should be a method (otherwise all of our functions would be methods of DependenciesTree
)dependencyTree.py
that contains mainly basic stuff about dependency trees and suddenly see a prepositionSet
for instanceI think that we should keep the preprocessing merged operations outside dependencyTree.py
and put only the "basic" operations into this file. (we can rename preprocessingMerge.py
into something else if you want to keep preprocessing
for operations that are not on a tree. For instance: initialMerge.py
).
...
No opinion on the programming style (but it seems to be much better :)
I am new to OOP, but it seems that the computations we are doing should be made in methods: these computations modify instances of DependencyTree
and are dependant of the local data.
I don't think it is an issue to have a big file.
But I agree that a lot of our functions should be DependencyTree
methods. If mergePreposition
is a method, then normalize
should also be a method.
I don't think it is an issue to have a big file. normalize should also be a method.
If you follow these rules, the files dependencyAnalysis.py
, normalization.py
and half of questionWordProcessing.py
will become methods of DependenciesTree
(ie: almost all the code in 1 file). I don't know if it is how people do, but i really don't want to code in this way :cry: Please, keep 1 file = 1 big part of the algo.
One of the best practices of OOP is the "single responsibility principle" [1] that states that each class should have only one responsibility.
So, applied to your use case, the DependenciesTree responsibility is, I think, to represent the tree and each algorithm that works on it (dependancy analyzis, normalization) should be in its own class that all manipulates DependenciesTree instances.
If you want to read more on OOP best practices, this Wikipedia article is a good entry point: https://en.wikipedia.org/wiki/SOLID_%28object-oriented_design%29
[1] https://en.wikipedia.org/wiki/Single_responsibility_principle
@Tpt: thank you for the advice. I will try to do this.
We would have piece of code like this (where tree
is an instance of DependencyTree
, tree.wordList
is a list of instances of Word
, tree.wordList[0].word
and preposition
are strings):
tree.wordList[0].word += ' ' + preposition
I find this ugly... Don't you?
I don't like the fact that some external function will modify deeply our objects. A minor modifications of the data structure could result in modification in all the codes (this has already happened).
Now we have a file initialMerge.py
with two classes NamedEntityMerging
and PrepositionMerging
.
foo.bar.doStuff()
if bar
is an attribute of foo
with a different class than foo
→ better hierarchization.it's ok for me
@ProgVal: could you please tell me if the code style seems ok in initialMerge.py, dependencyTree.py and preprocessing.py?
You don't need to go into details.
mergeSisterBrother
-> mergeSibling
?
"\t\"{0}\" -> \"{1}\"[label=\"{2}\"];\n"
-> '\t"{0}" -> "{1}"[label="{2}"];\n'
(same for the other strings in dependencyTree.py)
besides that, it looks good. (with a very local point of view)
Ok, thank you.
Review
dependencyTree.py
.Main modifications:
TreeGenerator
to generate the tree from the Stanford output.NER merging moved into the class DependencyTree.Preposition merging moved into the class DependencyTree.NamedEntityMerging
(fileinitialMerge.py
)PrepositionMerging
(fileinitialMerge.py
)→ The preprocessing is a true preprocessing (which depends on the original sentence).