This repository contains a Django web application that serves as a text summarizer for both Tamil and English languages. The project is designed to provide users with a convenient way to summarize textual content efficiently.
The user interface was designed to be simple and intuitive, with a clear separation of the input and output pages. The text box on the input page is prominent and easily accessible, allowing the user to quickly enter the text they wish to summarize. Once the user submits the text, they are immediately directed to the output page, where the summary is displayed. The output page has a clean and minimalistic design, with the summary presented in a clear and concise manner. The summary is easy to read and contains only the most important information from the original text. Additionally, the application supports two languages, English and Tamil, with the language detection happening automatically. Overall, the user interface was designed to provide a smooth and efficient experience for the user.
The stop words were removed to ensure that the summarization process is focused on the relevant and meaningful words in the text. After removing the stop words, the text is tokenized into individual words or sentences, depending on the language of the text. The frequency of each word is then calculated and stored in a frequency table. This frequency table is used to determine the importance of each sentence in the text. Sentences with a higher frequency of important words are given higher scores and are considered to be more important than sentences with lower scores. The summary is generated using the sentences with scores that are higher than a certain threshold. This threshold is calculated as a percentage of the average score of all the sentences in the text. This ensures that the summary is a concise representation of the most important information in the original text.
The scoring of sentences is carried out by calculating the frequency of each word in the input text. The frequency table is constructed for the preprocessed text. Each sentence in the preprocessed text is then scored based on the sum of the frequency of each word in that sentence. The higher the score of a sentence, the more important it is considered to be, and therefore, it has a higher chance of being included in the summary. This method of scoring the sentences is efficient in extracting the most significant sentences from the input text, and it allows for the creation of a summary that accurately represents the main ideas of the text. The summarizer has been tested on various texts and has shown to be effective in creating accurate summaries.
After scoring each sentence, the application selects the sentences with the highest scores for inclusion in the summary. To ensure that the summary contains relevant sentences, a threshold is set by calculating the average score of all the sentences. Any sentence with a score greater than 1.2 times the average score is included in the summary. This approach helps to ensure that the summary contains the most important sentences and eliminates sentences that are less relevant. Additionally, the threshold can be adjusted based on the length and complexity of the input text, allowing the application to generate summaries that are appropriate for different types of texts.
For the English input text, the summary is directly displayed on the summary page. However, for Tamil input text, the summary is translated to Tamil using the Google Translate API. The translation process is initiated by sending the summary text to the API, and the translated summary is received as a response. The translated summary is then displayed on the summary page. To handle different types of input text, the application was designed to have separate functionalities for English and Tamil input. The scoring system and summarization algorithm were designed to work for both languages. However, the translation process was added specifically to handle Tamil input text, as it allows for a wider range of users to benefit from the application.
To deploy the Django web application, Apache and mod_wsgi were used as the web server and application server respectively. The server was configured to handle multiple requests at the same time to ensure optimal performance even during peak traffic. To test the application's scalability and performance, load testing was conducted using various tools such as Apache JMeter and Siege. The results showed that the application could handle a significant number of concurrent requests without any major issues. Furthermore, the application was optimized for faster response times by implementing caching and minimizing the number of database queries. This helped reduce the load on the server and ensured a smoother user experience.
To install and run the Text Summarizer project locally, follow these steps:
Clone the repository to your local machine:
git clone https://github.com/your-username/text-summarizer.git
Navigate to the project directory:
cd text-summarizer
Create a virtual environment (recommended):
python -m venv venv
Activate the virtual environment:
On Windows:
venv\Scripts\activate
On macOS and Linux:
source venv/bin/activate
Install project dependencies:
pip install -r requirements.txt
Run the Django development server:
python manage.py runserver
Access the application in your web browser at http://localhost:8000/.
Contributions to this project are welcome. If you have suggestions, improvements, or bug fixes, please feel free to open an issue or submit a pull request.
This project is licensed under the MIT License. See the LICENSE file for details.