Open benearnthof opened 1 year ago
Currently we have the outlines for all of the summarization approaches mentioned above, by far the best quality (meaningful semantics) of summarization is obtained by using ChatGPT. The openai api is of course a paid service.
This needs to be wrapped up in a class, currently we have a lot of repeated statements: def spacy_summary(file: Dict, per=0.5) -> Dict:
This should also be possible without paid apis by loading opensource models into memory/gpu.
Users should be able to automatically abridge / summarize transcripts. Hour long podcast conversations can be all over the place so ideally the summarization should output both a high level summary (5-10 sentences) and low level summary for each part.
Possible Approaches: