MojoJolo / textteaser

TextTeaser is an automatic summarization algorithm.
MIT License
1.97k stars 251 forks source link

URL download and cleanup #19

Closed shyams80 closed 10 years ago

shyams80 commented 10 years ago

Don't see the code for downloading content from a url, removing the boilerplate etc... Can you point me to the section of the code that does that?

tkroman commented 10 years ago

I'm sorry, could you be more elaborate on that issue please? It's not clear for me what precisely do you want.

shyams80 commented 10 years ago

The mashape api can take a URL input and return the summarized result. I couldn't find the section of the code that provides this functionality. I am assuming that when given a URL, you will need to download the entire page, extract the main article (i.e. remove the boilerplate) and them pass the article to the main summarizer routine. How and where is the code handling this?

On Sat, Oct 26, 2013 at 11:20 PM, Roman Tkalenko notifications@github.comwrote:

I'm sorry, could you be more elaborate on that issue please? It's not clear for me what precisely do you want.

— Reply to this email directly or view it on GitHubhttps://github.com/MojoJolo/textteaser/issues/19#issuecomment-27151513 .

http://about.me/sunder.struck/bio

MojoJolo commented 10 years ago

The code here in Github only accepts the text and the title. The API in Mashape has another layer to extract the text from the webpage. It was using Python Goose for it.