CINERGI / DDH-Work

0 stars 0 forks source link

Page Scraping Enhancer #26

Open valentinedwv opened 6 years ago

valentinedwv commented 6 years ago

Grab URL from form page, scrape page for text for submitting to pipeline

valentinedwv commented 6 years ago

Creating a service using sumy https://github.com/miso-belica/sumy and spyne

Thoughts:

get_summary_url (url=,method='LexRank',sentences_count=10,keywords=false)

get_summary (text=,method='LexRank',sentences_count=10,keywords=false,isHtml=false)

returns:

{ 
summary:[string,string]
keywords:[{string,reference},{string,reference}]
}

Deep dive: Attempts to identify urls on page. retrieves if appropriate