Closed rmarronnier closed 5 years ago
I'd definitely be interested in seeing something like sumy
implemented in Cadmium. Seeing as there are multiple approaches that we may want to implement I'd do what has been done with the Tokenizer and various other pieces of Cadmium and include Summarization
(or something similar) as its own abstract class with Luhn
as a subclass. That way we can have a nice base API to use for all text summarizers.
Great ! That what I was planning :-D
Almost OT : As I was trying to implement another summarization method, I went through the tfidf.cr
Class.
@documents : Array(Document)
and its relevant methods (add_document
, build_document
, ...) are declared/used. If we want the same document handling logic elsewhere, maybe we should abstract it out. WDYT ?@corpus
variable and its getter method to get the documents merged and its relevant computed values (ie : a term frequency in a document will be obviously different from its frequency value in the full corpus which contains the document) I'm all for abstracting out something that can be used elsewhere
Ok. I'll wrap my head around it.
I've looked at sumy and implemented the Luhn method in Crystal using Cadmium.
I'm planning to implement more methods.
Would you be interested in a PR adding a text summarizer module to Cadmium ?