SimpleBrowserDotNet / SimpleBrowser.WebDriver

A webdriver for SimpleBrowser
Apache License 2.0
43 stars 16 forks source link

Being able to set the page source manually. #26

Closed AndersBillLinden closed 10 years ago

AndersBillLinden commented 10 years ago

To make it easier to write code for scraping a website that contains multiple web accesses, it can be handy to save the html so far to disk to keep down the number of web accesses.

Teun commented 10 years ago

I was thinking: would it be useful for your case if the SimpleBrowser instance could save it's whole state (including cookies, history etc) to a stream and rehydrate from that? That would be a better fit for your requirement than saving html and urls yourself, right?

I'd be happy to collaborate on that.

AndersBillLinden commented 10 years ago

Hm, seems overly complicated to serialize. Adding possible error sources. Increasing the amount of data to manage in some way or another. Also, setting the page source seems to be an operation that a programmer could expect exists, although it could increase the amount of errors, especially in such a complicated library. When I saved the webpage to disk, I was also cleaning it from a lot of scripts that I do not need for the next step and other things that would actually result in more web accesses than necessary to remote web servers.

AndersBillLinden commented 10 years ago

If the pull request for file urls in the SimpleBrowser project will be accepted and the empty setter for PageSource in the project SimpleBrowser.WebDriver is removed, I think my personal needs are fulfilled and some order is restored.

AndersBillLinden commented 10 years ago

But... as I said... setting the page source seems to be an operation that a programmer could expect exists.

AndersBillLinden commented 10 years ago

Being able to put a SimpleBrowser into a stream is not bad actually. What I am thinking about is rather the possiblity to store one in an asp.net session. Seems possible even if it is not serializable. I dont know how good it is to do so...

Teun commented 10 years ago

Err. You can set anything on an ASP.NET Session, as long as you use the In Memory session provider (the default). For some others, it has to be Serializable. I can't say if that is a good solution for what you try to do.

2014-08-17 15:57 GMT+02:00 Anders Lindén notifications@github.com:

Being able to put a SimpleBrowser into a stream is not bad actually. What I am thinking about is rather the possiblity to store one in an asp.net session. Seems possible even if it is not serializable. I dont know how good it is to do so...

— Reply to this email directly or view it on GitHub https://github.com/Teun/SimpleBrowser.WebDriver/pull/26#issuecomment-52422547 .

AndersBillLinden commented 10 years ago

I do not think it always suffices to store cookies in the session, but instead keep the whole browser object in it. Because of the website playing javascript tricks.

AndersBillLinden commented 10 years ago

Ah, yeah, there is no serialization at all, so it should be no bad, except for the memory consumption.

Teun commented 10 years ago

So, I want to keep all of these functions that are not part of IWebDriver outside this project. There are ways to get a reference to the actual instance of SimpleBrowser (I will make a FAQ out of that). Then, you can always invoke whatever you feel like on the browser, but not htrough the WebDriver instance.