This is a tool that takes a text document (PDF or TXT) or YouTube transcript and generates a concise summary using GPT-4O-Mini, GPT-4 or GPT-3.5-turbo. It can accurately summarize hundreds of pages of text. It's built with Python and Streamlit and leverages the langchain library for text processing. While the final output is generated with the latest GPT family model from OpenAI, GPT-4O-Mini (one of the LLMs that powers ChatGPT), only a small portion of the overall document is used in the prompts. Before any call is made to either LLM, the document is separated into small sections that contain the majority of the meaning of the document.
Summarize your documents here (no API key required): https://gpt-document-summarizer.streamlit.app/
streamlit run main.py
main.py
: Streamlit app main fileutils.py
: Contains utility functions for document loading, token counting, and summarizationstreamlit_app_utils.py
: Contains utility functions specifically for the Streamlit appmain()
: Entry point for the Streamlit appprocess_summarize_button()
: Processes the "Summarize" button click and displays the generated summaryvalidate_input()
: Validates user input and displays warnings for invalid inputsvalidate_doc_size()
: Validates the document size for token limitsdoc_loader()
: Loads a document from a file pathtoken_counter()
: Counts the number of tokens in a text stringdoc_to_text()
: Converts a langchain Document object to a text stringdoc_to_final_summary()
: Generates the final summary for a given documentsummary_prompt_creator()
: Creates a summary prompt list for the langchain summarize chainpdf_to_text()
: Converts a PDF file to a text stringcheck_gpt_4()
: Checks if the user has access to GPT-4token_limit()
: Checks if a document has more tokens than a specified maximumtoken_minimum()
: Checks if a document has more tokens than a specified minimum