VIDA-NYU / ache

ACHE is a web crawler for domain-specific search.
http://ache.readthedocs.io
Apache License 2.0
455 stars 134 forks source link

Embedded browser & open crawled file #140

Open binhlvu opened 7 years ago

binhlvu commented 7 years ago

It's very nice if I can browse the crawling web inside ACHE (localhost:8080) for debugging purpose. Because sometime pages ACHE got are different with pages I get in my browser.

Also, it would save users lot of time to debug the crawled page if we can open it directly in ACHE (without manually writing code to read content and save to html file).

aecio commented 7 years ago

I agree, it would be a nice feature. Can you give more details? Would you like to see the cached HTML source code, or maybe the cached HTML rendered in something like an iframe, or the extracted text is sufficient?

binhlvu commented 7 years ago

Thank you for considering this.

I think a cached HTML rendered in an iframe is good. Also, if you guy allows users to specify an URL of a cached HTML to render, it would be great.