epfl-dlab / quootstrap

Unsupervised method for extracting quotation-speaker pairs from large news corpora.
27 stars 2 forks source link

Is there a python version of this model? #2

Open workPA opened 2 years ago

workPA commented 2 years ago

I need a python model for quotes extraction. If anyone knows how to build it from this app or an alternative then that would be helpful.

cervisiarius commented 2 years ago

Hi @workPA, This is Bob West from the team of Quootstrap developers. We have since built a more powerful quote extractor based on Transformers, called Quobert. It's available as Python code here (in the repo for Quotebank, a large quote corpus): https://github.com/epfl-dlab/Quotebank Hope this helps! Bob

alex2awesome commented 2 years ago

isn't this needed for that?

cervisiarius commented 2 years ago

You could also explore Quobert, a deep-learning-based method: https://github.com/epfl-dlab/Quotebank Pre-trained models are available here: https://github.com/epfl-dlab/Quotebank/releases/

alex2awesome commented 2 years ago

AFAICT, Quotebank needs Quootstrap to be run as a preprocessing step to the data, even for inference:

As said in Step 2.3 of the Quotebank readme, which is required for inference: "[we] find the partial mention of entities in the data from the full mentions extracted by Quootstrap."

So unless I'm misunderstanding, at least part of this Quootstrap code is an essential part of the Quotebank inference pipeline.