Priyanka-3008 / MVP-Assignment

0 stars 0 forks source link

Article writer #5

Open shibiyamvp opened 3 months ago

shibiyamvp commented 3 months ago

Create an API to generate articles based on a provided document or website a. Ensure the article contains only content from the specified document or website. b. API input parameters: i. Topic for the article. ii. Document (can be a book or anything in pdf format). iii. Article Language (English or Arabic). [means input language is 'Arabic' return arabic article , if it is in english then return article in english, for topic and document in any language(arabic/english)]

Priyanka-3008 commented 3 months ago

import os from PyPDF2 import PdfReader from langdetect import detect from transformers import pipeline from transformers import AutoTokenizer, AutoModelForSeq2SeqLM from googletrans import Translator from flask import Flask,request,jsonify import warnings

warnings.filterwarnings("ignore",category=FutureWarning)

app=Flask(name)

def extract_text_from_pdf(pdf_path): reader=PdfReader(pdf_path) text="" for page in reader.pages: page_text=page.extract_text() if page_text: text+=page_text return text

def detect_language(text): return detect(text)

def translate_text(text,src,dest): translator=Translator() translation=translator.translate(text,src=src,dest=dest) return translation.text

def generate_article(text,topic,target_language): tokenizer = AutoTokenizer.from_pretrained("t5-small") model = AutoModelForSeq2SeqLM.from_pretrained("t5-small") summarizer=pipeline("summarization",model=model,tokenizer=tokenizer)

try:
    detected_topic_language=detect_language(topic)
    detected_text_language=detect_language(text)

    if detected_topic_language!=target_language:
        topic=translate_text(topic,src=detected_topic_language,dest=target_language)

    if detected_text_language!=target_language:
        text=translate_text(text,src=detected_text_language,dest=target_language)

    text_with_topic= f'Topic:{topic},{text}'
    inputs=tokenizer(text_with_topic,return_tensors="pt",truncation=True,max_length=512,padding='max_length')
    input_text=tokenizer.decode(inputs["input_ids"][0],skip_special_tokens=True)
    summary=summarizer(input_text,max_length=300,min_length=100,do_sample=False)
    summary_text=summary[0]['summary_text']

    if target_language.lower() =='ar' and detected_text_language!='ar':
        summary_text=translate_text(summary,src='en',dest='ar')
    print("Generated Article:",summary_text)
    return summary_text if summary_text else "No summary generated"
except Exception as e:
    return str(e)

@app.route('/generate_article',methods=['POST'])

def generate_article_endpoint(): topic=request.form.get('topic') target_language=request.form.get('language') pdf_file=request.files.get('document')

print("Topic:",topic)
print("Language:",target_language)
print("Request files:",request.files)
print("Request form:",request.form)

if not target_language:
    return jsonify({"Error":"No Language is  provided"}),400
if not topic:
    return jsonify({"Error":"No topic is  provided"}),400
if not pdf_file:
    return jsonify({"Error":"No document is provided"}),400
if pdf_file.filename=='':
    return jsonify({"Error":"No file is selected"}),400

os.makedirs("/Users/mvp/Desktop/taskflask/Articlewriter/Saved Docs",exist_ok=True)
save_path=os.path.join("/Users/mvp/Desktop/taskflask/Articlewriter/Saved Docs",pdf_file.filename)
pdf_file.save(save_path)

document_text=extract_text_from_pdf(save_path)
if not document_text.strip():
    return jsonify({"Error": "No text extracted from the document"}), 400

article=generate_article(document_text,topic,target_language)

return jsonify({"Article":article})

if name=="main": app.run(debug=True)

Priyanka-3008 commented 3 months ago

updated one

import os from PyPDF2 import PdfReader from langdetect import detect from transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLM, MarianMTModel, MarianTokenizer from googletrans import Translator from flask import Flask, request, jsonify import warnings import pytesseract from PIL import Image import pdf2image

warnings.filterwarnings("ignore", category=FutureWarning)

app = Flask(name)

English summarization setup

en_tokenizer = AutoTokenizer.from_pretrained("t5-small") en_model = AutoModelForSeq2SeqLM.from_pretrained("t5-small") en_summarizer = pipeline("summarization", model=en_model, tokenizer=en_tokenizer)

Arabic translation setup

ar_tokenizer = MarianTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-ar") ar_model = MarianMTModel.from_pretrained("Helsinki-NLP/opus-mt-en-ar") ar_translator = pipeline('translation_en_to_ar', model=ar_model, tokenizer=ar_tokenizer)

def split_text(text, max_length):

Splits the text into chunks of max_length, cutting by full sentences to avoid breaking words

sentences = text.split('.')
current_chunk = ""
chunks = []

for sentence in sentences:
    if len(current_chunk) + len(sentence) < max_length:
        current_chunk += sentence + '.'
    else:
        chunks.append(current_chunk)
        current_chunk = sentence + '.'
if current_chunk:
    chunks.append(current_chunk)
return chunks

def summarize_text(text, summarizer): chunks = split_text(text, 500) # Example max_length smaller than model's max length summaries = [summarizer(chunk)[0]['summary_text'] for chunk in chunks] full_summary = ' '.join(summaries) return full_summary

def extract_text_from_pdf(pdf_path): reader = PdfReader(pdf_path) text = "" for page in reader.pages: page_text = page.extract_text() if page_text: text += page_text return text

def pdf_to_text_ocr(pdf_path): pages = pdf2image.convert_from_path(pdf_path) text = '' for page in pages: page_text = pytesseract.image_to_string(page, lang='ara') text += page_text return text

def detect_language(text): return detect(text)

def translate_text(text, src, dest): translator = Translator() try: translation = translator.translate(text, src=src, dest=dest) return translation.text except Exception as e: print(f"Translation error: {e}") return None

def generate_article(text, topic, target_language): detected_topic_language = detect_language(topic) detected_text_language = detect_language(text)

if detected_topic_language != target_language:
    topic = translate_text(topic, src=detected_topic_language, dest=target_language)
    if topic is None:
        return "Error in translating topic"

if detected_text_language != target_language:
    text = translate_text(text, src=detected_text_language, dest=target_language)
    if text is None:
        return "Error in translating text"

text_with_topic = f'Topic: {topic}, {text}'

# Choose the correct pipeline based on the target language
summarizer = ar_translator if target_language.lower() == 'ar' else en_summarizer
summary_text = summarize_text(text_with_topic, summarizer)
return summary_text

@app.route('/generate_article', methods=['POST']) def generate_article_endpoint(): topic = request.form.get('topic') target_language = request.form.get('language') pdf_file = request.files.get('document')

if not target_language:
    return jsonify({"Error": "No Language is provided"}), 400
if not topic:
    return jsonify({"Error": "No topic is provided"}), 400
if not pdf_file:
    return jsonify({"Error": "No document is provided"}), 400
if pdf_file.filename == '':
    return jsonify({"Error": "No file is selected"}), 400

save_directory = "/Users/mvp/Desktop/taskflask/Articlewriter/Saved Docs"
os.makedirs(save_directory, exist_ok=True)  # This will create the directory if it does not exist
save_path = os.path.join(save_directory, pdf_file.filename)
pdf_file.save(save_path)

document_text = extract_text_from_pdf(save_path)
if not document_text or not document_text.strip():
    document_text = pdf_to_text_ocr(save_path)
    if not document_text.strip():
        return jsonify({"Error": "No text extracted from the document, even after OCR"}), 400

article = generate_article(document_text, topic, target_language)
return jsonify({"Article": article if article else "Article generation failed"})

if name == "main": app.run(debug=True)

OUTPUT

inputs Topic:Agriculture Language: en Document: Problemsoffarmers.pdf Outputs { "Article": "topic: agriculture, See discussions, st ats, and author pr ofiles f or this public ation at https://www.researchgate. ne t/public ation/362124853 A Study on the Agricultu re Sector and the Problems Associated with it which has an impact on the Farmers Article July 2022 . the user has r equest ed enhanc ement of the do wnlo aded file . farmers are the main pillars of Indian economy and a source of food security for the whole nation . farmers are part of a growing economy in the west of the country . every year thousands of farmers commit suicide due to lower income and heavy debt, they don’t have access to market, new technologies and irritation f acilities . their land is being taken away by private sectors, contract fa rming, small holding of lands, climate change, food shortage, wa ter, droughts and floods have all affected the live of t he farmers . the government has la unched many schemes and brought in technology advancement still those facilities have not reduced the number of suicide cases . farmers today belongs to the most vulnerable section of the society . we need to all farmers access to the marke t, create better infrastructure and road connectivity followed by fr ee health care and education provisions for the farmers and their fami lies, special food package and medical insurance . \"A Study on the Agriculture Sector and the Problems Associated with it which has an impact on the farmers\" published in international Journal of Trend in Scientific Research and Development (ijtsrd) this is an open Access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0) the article is distributed under CC By 4.0 . agriculture sector involves horticulture, poultry industry, animal husband ry, farms, fishery, forestry, agricultural chemistry, apiculture, aqua farming, agricultural communicatio n, agricultural engineering . the secto r comprises of branches which contribute a good amount to the GDP of a country . every government has introduced different schemes and taken measures to improve the sector for its best . climate change has badly affected the weather conditions resulting in drought and famine, prolong ed dry seasons in some regions . extreme rainfall has resulted in food and wa ter crisis . the word had adopted to sustainable farming which has helped combat a lot of problem s from deforestation to soil erosion . less emissions o r no emissions of greenhouse gases into the atmosphere an d less amount of water and chemical fertilisers . south Asia countries like india and Bangladesh are highly dependent on agricultural for their economy growth . in Bangladesh 90 % of the people are farmers, the country insists of being prone to floods have shifted to sustainable methods of farming . the agricultural sector is under develop ed in some ways, which has made the life of the farmer s miserable, we have experienced over the past few years like 3 lakh cases of farmer suicide . the land holdi ng is so small, the small and marginalised farmers have n o access to a minimum income . contract farming in india made it miserable with no access to market for many farmers and regions . the farmers have extreme low income and control over the land regions, and contract farming has no benefit to them . research Methodology For the purpose of this exploration, I have used a combination of two of the archetypical social sciences research tools application . question were asked to the common youth, public policy analyst, rural people, farmers ,survey, interviews . the main areas of exploration in this paper incorporates 1. Understanding the farmers problem in India . 3. What has the government done to improve the conditions of farmers in India. 3. Why so many farmers are committing suicide each year. the changing pattern of growing crops has brought Agricultural sector into the fore front . India is one of the leading producer of man y crops and even the largest producer of cotton, bananas, milk . since independence the government has initiated many measures and schemes for the upliftment of farmers and making their lifestyle better from green to white revoluti on . the agriculture sector comprises of many branches . we have seen an improvement in the horticulture sector, India is the second largest producer of fish in the world . Advance technologies have been brought in, we have regional banks and agricultural banks for the farmers, still farmers a re the poorest in the country, the small, marginalised and large farmers everyone is suffering . the government has introduced he Pradhan Mantri krishi Sichai Yojan . aims to improve the productivity by providing better irrigation facilities . paramparagat Krishi Vikas Yojna has been launched to encourage farmers to adopt organic farming . a special scheme has been introduced in the north eastern region to promote organic farming to 90% . price stability funds with a corpus of 500 crore have been introduced to support market intervention for price command of agri-horticultura l Industries followed by which we have Gran Jyoti Yojana which will supply continuously electricity . soil health card was also introduced, the n we have eNam and many other online sites and normal SMS and calls to guide the farmers on the us e of fertilizers ,soil fertile, irrigation . less than 4 % of people have adopted organic farming . we are opting to organic farming part by part . ad also opens job opportunity for farmers . Contrac t Farming has been introduced which according to government is a way to lift the farmers and increase their income followed by in some states th e same of Mandis has been removed . we see a direct contact between the consumers and producers that is with the farmers . every year thousands of farmers are killing themselves and majority are living before poverty line . 76% of the farmers are shifting to no n- farm sectors to get a better lifestyle and income . from 1995 to 2015 more than 3,00,000 farmers have committed suicide . the actual numb er of famer suicide in these 20 years has been way international Journal of Trend in Scientific Resear ch and Development . in india most of the farmers have smal l land holdings and the monsoon rains are very uncertain and irregular . the reality is whether the year experience good rainfall or not the farmers loses eit her way . if the rain is good in a year there’s a crop fai lure . india had the highest ever cultivation of Pulses aka Dal 22 . the government doesn’t want an increment in the cos t levels as it will ultimately reduce the cost of the prices . 95 million metric tonne of pulses were produced, it was sufficient to fulfil the need s of the whole country . even then the government imported 6.6 million tonnes at zer o input duties . the farmers had to bear significant loses, it shows the administrative failure of the government . the government decided MSP for 26 crops, it cha nges every year . between 2009 to 20 13 the average rate of growth in the MSP was 19. for the year 2014 to 2017 the rate dropped to mere 3.6% . in 2018 there was a record breaking in crease in MSP from the government . the income which they receive from cultivation is very less . infliction is increasing the country . if we talk about Farmer income which comes from thr ee sources . marginal farm ers earn only rupees 566 per month from cultiv ation . those famers who have a land of more than 2 hectares are 7572 per month . 52% of the farmers are under loan with amounts to rupees 1,04,000 per farmer they are under debt . if the government increases the budget it can provid e much benefit to the farmers . sma ll farmers have given corporates entry into th e agriculture sector . they have captured many lands, Pastoralist do not have their own farms for grazing their cattle . corporates intends maximum p rofit and doesn’t give any care towards the preserv ation of land or soil . contract farming also try to entail f oreign varieties crops in this way the local grown crops which has provided nutrition, if gone will result in mal nutrition in india . there is more use of mechanism which has l ed to increase in rural unemployment followed by this far mers don’t have access to market which has resulted in the exploitation of farmers . small farmers contribute 51%, and 46% operated land . lack of irrigation, crop failure etc, fo llowed by market risks, high transaction cost, poor price realisation . the MSP depends on lower cost of intensive crops which have been the main reasons for suicide among farmers . lib eralisation and private sector entering the sector, the life of farmers have been worst with they control of m ajority of the land . government and other private agents are taking away agricultural lands for developing big factorie s and industries . we see day by day the size of the agriculture getting smaller, followed by soil erosion, salinization, deforestation which is makin g the land unfit for cultivation . being uneducated they don’t know how to react to the court and fight for their rights and lands . we need to make situations better for farmers and bring in new poli cies for better facilities for the farmers. way forward farmers should be given right over the land and the re should be separate farmer land rights where no private sector can enter or the land can be used fo r any development purpose . they need to get access to the market, a direct link between the consumers and producers and also between the farmers and market, eliminating intermediary . every state should take measures to improve the conditions for their agriculture as eve ry state grows different crops . the state can play a wide role tha n the central role . special security measures should be introduced for the farmers both male and female, heath care package for pregnant farmers and their children, free health care facilitates once a month for all farmers . special schools for farmers sh ould be open in the villages where it is compulsory for every farmers to attend on the different technologi es, means and methods introduced and how to use it, education on different fertilisers, pesticides, its positive and negative impact, organic farming . farmers are the most exploited section of t he society where they work hard day and night and thei r income is so less than it becomes impossible to survive a month with that . we have seen a rise in suicide cases and maximum people told who are left as farmers are shifting to non-farms secto rs for employment . Agricultural is the largest sector in india and serves as the primary mea ns of economy for the country with soon suffer from food crisis . farmers leaving their far ms, the first question arises with lack of farmers how will the country food security be satisfied if we don’t have cultivators . government need to do something to reduce the rate of farmer suicide . government should give farmers access to markets by giving farmers access . the economic Times, pSCNOTES, says the government needs to improve agriculture . in 2020, Yogeshi Joshi, 2020, July, Troubled farmers- main agriculture problems in India and their solutions . the report is based on a survey of farmers in the u.s." }