boudinfl / pke

Python Keyphrase Extraction module
GNU General Public License v3.0
1.57k stars 291 forks source link

Keyword extraction using dependency parser instead of POS #130

Closed icyanide9 closed 4 years ago

icyanide9 commented 4 years ago

Current textrank algorithm uses POS like NOUN, PROPN,ADJ etc and I am looking for a way to use dependencies like nsubj, amod, advmod to generate key phrase. The dependencies can be generated using stanford nlp dependency parser.

Is there any tweak that I can do to implement this?

ygorg commented 4 years ago

Hi ! pke was not meant for that so it is really hacky but it is possible. From what I saw from this file.

<sentence>
  <tokens>
    <token id=2>
      <word>University</word>
    </token>
  </tokens>
  <dependencies>
    <dep type="compound">
      <governor idx="2">University</governor>
      <dependent idx="1">Stanford</dependent>
    </dep>
  </dependencies >
</sentence>

I think you can hack pke by

  1. creating a new MinimalCoreNLPParser that fills in the .pos member with either the depency of the governor or dependent using the id and idx. .pos is a list of tag, the i-th element corresponds to the i-th token in the original sentence.

  2. modifying LoadFile.load_document to use your newly created CoreNLPParser.

  3. use pke as usual but providing the pos function's argument with dependency tags.

ygorg commented 4 years ago

Closing this issue because of inactivity.

icyanide9 commented 4 years ago

Thank you and sorry for delay. I will try out this option