albertogoffi / toradocu

Toradocu - automated generation of test oracles from Javadoc documentation
Other
42 stars 21 forks source link

Placeholders make Stanford Parser produce a wrong semantic graph #97

Closed albertogoffi closed 7 years ago

albertogoffi commented 7 years ago

In Toradocu the input sentence shape is <= 0 or scale is <= 0. is preprocessed into shape is INEQUALITY_0 or scale is INEQUALITY_1. The semantic graph produced by the Stanford parser is the following:

-> INEQUALITY_0/NN (root)
  -> shape/NN (nsubj)
  -> is/VBZ (cop)
  -> or/CC (cc)
  -> scale/NN (conj:or)
  -> INEQUALITY_1/JJ (acl:relcl)
    -> is/VBZ (cop)
  -> ./. (punct)

Instead, we expect a graph like the following:

-> INEQUALITY_0/JJ (root)
  -> shape/NN (nsubj)
  -> is/VBZ (cop)
  -> or/CC (cc)
  -> INEQUALITY_1/JJ (conj:or)
    -> scale/NN (nsubj)
    -> is/VBZ (cop)
  -> ./. (punct)

The difference in the graphs is due to the different POS tagging of the inequalities placeholders. I suggest to modify the current proposition extractor to:

  1. Preprocess-text like we currently do.
  2. Ad-hoc POS-tag the inequalities placeholder words as JJ (adjective) or another tag that make sense and cause the Stanford Parser to behave correctly.
  3. Build propositions like we currently do.

More information about how to add custom POS tags can be found here: https://nlp.stanford.edu/software/parser-faq.shtml#f

ariannab commented 7 years ago

I think that this should be closed now.

albertogoffi commented 7 years ago

We added a custom POS tagging phase in Toradocu that takes care of properly tagging placeholders.