benearnthof / podcasty

MIT License
1 stars 0 forks source link

Add summarization functionality #2

Open benearnthof opened 1 year ago

benearnthof commented 1 year ago

Users should be able to automatically abridge / summarize transcripts. Hour long podcast conversations can be all over the place so ideally the summarization should output both a high level summary (5-10 sentences) and low level summary for each part.

Possible Approaches:

benearnthof commented 1 year ago

Currently we have the outlines for all of the summarization approaches mentioned above, by far the best quality (meaningful semantics) of summarization is obtained by using ChatGPT. The openai api is of course a paid service.

This needs to be wrapped up in a class, currently we have a lot of repeated statements: def spacy_summary(file: Dict, per=0.5) -> Dict:

This should also be possible without paid apis by loading opensource models into memory/gpu.

benearnthof commented 1 year ago

https://python.langchain.com/en/latest/modules/chains/index_examples/summarize.html investigate this