Closed johnhutx closed 2 years ago
Hi, I am not sure if I understand it correctly. Do you want to provide a List[List[str]]
where the first layer is the list of documents and the second layer is the list of sentences in that document?
If so, what's the level of extraction for the extractive summarization?
Yes, I would want to provide a List[List[str]]. Instead of extracting at the sentence level, I would like to extract at the List[str] level. It's just like extracting a group of sentences every time.
Hi @johnhutx, I am not sure if I understand the situation correctly, can you make an example? Is it possible to merge the inner list and use the current API instead?
No problem @niansong1996. Consider a TV screenplay that contains multiple scenes List[scenes]. I would like to extract the important scenes instead of sentences from the screenplay. Each scene usually contains multiple sentences, which can be represented as a List[str]. The goal is to extract the scenes (List[str]) from the screenplay (List[List[str]]).
Thanks for the clarification, it's much clearer to me now.
If you would like to extract scenes using summarization, I assume there is no query? Does this mean that the model would need to figure out which scenes are more important than others?
For the task you described, I think probably the best choice is to make a subclass of our lexrank model here and customize it (change L27-40). In our implementation, we split the document into a list of sentences, but you could potentially input a list of scenes, each one of which is concatenated sentences from the scene.
Hope this is helpful.
Thank you for the clarification.
Hello, Is it possible to provide a list of (already split) sentences as the source input to the summarizer, as opposed to a single source document? The goal is to treat each list of sentences as one long sequence during extractive summarization.