PyTextRank is a Python implementation of TextRank as a spaCy pipeline extension, for graph-based natural language work -- and related knowledge graph practices. This includes the family of textgraph algorithms:
Popular use cases for this library include:
See our full documentation at: https://derwen.ai/docs/ptr/
See the "Getting Started" section of the online documentation.
To install from PyPi:
python3 -m pip install pytextrank
python3 -m spacy download en_core_web_sm
If you work directly from this Git repo, be sure to install the dependencies as well:
python3 -m pip install -r requirements.txt
Alternatively, to install dependencies using conda
:
conda env create -f environment.yml
conda activate pytextrank
Then to use the library with a simple use case:
import spacy
import pytextrank
# example text
text = "Compatibility of systems of linear constraints over the set of natural numbers. Criteria of compatibility of a system of linear Diophantine equations, strict inequations, and nonstrict inequations are considered. Upper bounds for components of a minimal set of solutions and algorithms of construction of minimal generating sets of solutions for all types of systems are given. These criteria and the corresponding algorithms for constructing a minimal supporting set of solutions can be used in solving all the considered types systems and systems of mixed types."
# load a spaCy model, depending on language, scale, etc.
nlp = spacy.load("en_core_web_sm")
# add PyTextRank to the spaCy pipeline
nlp.add_pipe("textrank")
doc = nlp(text)
# examine the top-ranked phrases in the document
for phrase in doc._.phrases:
print(phrase.text)
print(phrase.rank, phrase.count)
print(phrase.chunks)
See the tutorial notebooks in the examples
subdirectory for
sample code and patterns to use in integrating PyTextTank with
related libraries in Python:
https://derwen.ai/docs/ptr/tutorial/
spaCy
version.
See:
[CHANGELOG.md](https://github.com/DerwenAI/pytextrank/blob/main/CHANGELOG.md)
<img alt="thanks noam!" src="https://raw.githubusercontent.com/DerwenAI/pytextrank/main/docs/assets/noam.jpg" width="231" />
Source code for PyTextRank plus its logo, documentation, and examples have an MIT license which is succinct and simplifies use in commercial applications.
All materials herein are Copyright © 2016-2024 Derwen, Inc.
Please use the following BibTeX entry for citing PyTextRank if you use it in your research or software:
@software{PyTextRank,
author = {Paco Nathan},
title = {{PyTextRank, a Python implementation of TextRank for phrase extraction and summarization of text documents}},
year = 2016,
publisher = {Derwen},
doi = {10.5281/zenodo.4637885},
url = {https://github.com/DerwenAI/pytextrank}
}
Citations are helpful for the continued development and maintenance of this library. For example, see our citations listed on Google Scholar.
Many thanks to our open source sponsors; and to our contributors: @ceteri, @louisguitton, @Ankush-Chander, @tomaarsen, @CaptXiong, @Lord-V15, @anna-droid-beep, @dvsrepo, @clabornd, @dayalstrub-cma, @kavorite, @0dB, @htmartin, @williamsmj, @mattkohl, @vanita5, @HarshGrandeur, @mnowotka, @kjam, @SaiThejeshwar, @laxatives, @dimmu, @JasonZhangzy1757, @jake-aft, @junchen1992, @shyamcody, @chikubee; also to @mihalcea who leads outstanding NLP research work, encouragement from the wonderful folks at Explosion who develop spaCy, plus general support from Derwen, Inc.