Keyword extraction using dependency parser instead of POS

icyanide9 commented 4 years ago

Current textrank algorithm uses POS like NOUN, PROPN,ADJ etc and I am looking for a way to use dependencies like nsubj, amod, advmod to generate key phrase. The dependencies can be generated using stanford nlp dependency parser.

Is there any tweak that I can do to implement this?

ygorg commented 4 years ago

Hi ! pke was not meant for that so it is really hacky but it is possible. From what I saw from this file.

<sentence>
  <tokens>
    <token id=2>
      <word>University</word>
    </token>
  </tokens>
  <dependencies>
    <dep type="compound">
      <governor idx="2">University</governor>
      <dependent idx="1">Stanford</dependent>
    </dep>
  </dependencies >
</sentence>

I think you can hack pke by

creating a new MinimalCoreNLPParser that fills in the .pos member with either the depency of the governor or dependent using the id and idx. .pos is a list of tag, the i-th element corresponds to the i-th token in the original sentence.
modifying LoadFile.load_document to use your newly created CoreNLPParser.
use pke as usual but providing the pos function's argument with dependency tags.

ygorg commented 4 years ago

Closing this issue because of inactivity.

icyanide9 commented 4 years ago

Thank you and sorry for delay. I will try out this option

boudinfl / pke

Keyword extraction using dependency parser instead of POS #130