SimpleBrowserDotNet / SimpleBrowser

A flexible and intuitive web browser engine designed for automation tasks. Built on .NET Standard 2.1.
Other
362 stars 105 forks source link

Async version, remove depenency on RazorLight, implemented IDisposable #260

Open mikaelliljedahl opened 3 years ago

mikaelliljedahl commented 3 years ago

Finally got the tests to run. Had to do some adjustments to handle Async in the "Mockers" and update url to jkorpela.fi/cgi-bin/echo.cgi (missing https so we got redirected when submitting the forms).

Would be very glad if you could merge this PR and make a new nuget release (I already bumped the nuget version for it to be ready).

BR Mikael

takielias commented 3 years ago

@kevingy Can You please check it?

mikaelliljedahl commented 3 years ago

@kevingy Can You please check it?

@takielias I created a forked branch here: https://github.com/mikaelliljedahl/SimpleHeadlessBrowser and nuget here: https://www.nuget.org/packages/SimpleHeadlessBrowser/ waiting for this PR to completed ;)

takielias commented 3 years ago

@mikaelliljedahl Can you please fix the above issues?

mikaelliljedahl commented 3 years ago

I have pushed the changes to the PreparePRToMainProject branch. Do I need to do something else to update the PR? (I am used to Microsoft DevOps)

kevingy commented 3 years ago

I have pushed the changes to the PreparePRToMainProject branch. Do I need to do something else to update the PR? (I am used to Microsoft DevOps)

I'll check, but that is all you should have to do.

takielias commented 2 years ago

@mikaelliljedahl Did you check all the comments of @kevingy ?

mikaelliljedahl commented 2 years ago

@mikaelliljedahl Did you check all the comments of @kevingy ?

Yea, but I guess we kind of disagree on the Browser object needed to implement IDisposable pattern. Another way to eliminate the memory leak problem is to completly remove the static variable containing the list of active browsers. That will of course change the behavior a bit. It has been done already in the separate fork branch that I created waiting for this PR to be accepted.

BR Mikael

takielias commented 2 years ago

@mikaelliljedahl Did you check all the comments of @kevingy ?

Yea, but I guess we kind of disagree on the Browser object needed to implement IDisposable pattern. Another way to eliminate the memory leak problem is to completly remove the static variable containing the list of active browsers. That will of course change the behavior a bit. It has been done already in the separate fork branch that I created waiting for this PR to be accepted.

BR Mikael

@kevingy Could you please consider the PR?

mikaelliljedahl commented 2 years ago

@takielias You are free to use the fork, I made a fresh nuget for it too with the static list of browsers causing the memory leaks removed: https://www.nuget.org/packages/SimpleHeadlessBrowser/

kevingy commented 2 years ago

@mikaelliljedahl Implementing IDisposable will not fix the memory leak! You either don't have a full understanding of the problem internal to SimpleBrowser or you don't have a full understanding of IDisposable - or both.

SimpleBrowser maintains an infinitely growing collection of HTML nodes. So long as a Browser instance exists, the collection will grow every time a navigation occurs. (This is why I have always suggested transient Browser instances.) None of your changes address this real problem. Simply implementing IDisposable certainly won't fix it.

Furthermore, IDisposable "Provides a mechanism for releasing unmanaged resources." (https://docs.microsoft.com/en-us/dotnet/api/system.idisposable?view=net-6.0). The collection contains all managed resources. In fact, I don't believe there are any unmanaged resources in SimpleBrowser. Everything is managed. Therefore, there is no need for IDisposable in the project at all.

There is good work in this PR. I wish you would have separated the work out so that I could have accepted parts of it at a time rather than a single huge PR. Additionally, your changes break backwards compatibility. I don't want to have to rewrite my SimpleBrowser applications to support your changes.

@takielias What is it in the PR that you need, exactly. Are you wanting the asynchronous methods? A "fix" for the memory leak?

takielias commented 2 years ago

@kevingy async is awesome but memory leak is a big issue.

kevingy commented 2 years ago

@kevingy async is awesome but memory leak is a big issue.

@takielias

Async is awesome! I would welcome it with open arms, so long as it was implemented in a way that didn't break backwards compatibility.

To avoid the memory leak, change your code to use transient Browser instances. That is, don't do this:

// This will leak, with an infinitely growing collection of HTML elements.
var browser = new Browser();
do 
{
    browser.Navigate("someurl");
} while (true);

Do this:

// This will still leak, but will be cleaned up when browser goes out of scope at the end of each loop.
do 
{
    var browser = new Browser();
    browser.Navigate("someurl");
} while (true);

The infinitely growing collection leak was added by, in my opinion, a bad contribution that was not fully vetted before being merged. To fix it, significant changes are required to the way that SimpleBrowser handles it's navigation history. The change was added so that when the user "clicks" the Back button, the SimpleBrowser loads the previous page from memory rather than navigating to the previous URL. So, the "leak" is an intentional "feature". The HTML element collection grows infinitely so that the user can click Back all the way to the first navigation. That feature (clicking the Back button) is probably little used and probably the greatest problem SimpleBrowser has.

takielias commented 2 years ago

@kevingy Thanks for the explanation. I used this Package in several projects but I did not need to use the Back button yet.

mikaelliljedahl commented 2 years ago

The "leak" you are describing is not what I would call a leak, it is just a consequence of how you use the browser if you open an infinite number of windows. The problematic behavior is that the system can never free resources if there is an exception before calling Close(); The reason for this is that a static list of all browser windows are stored and those references are never removed unless you call Close(). IMO having a library that requires a consumer to call the Close method to remove the reference from the static list of all browsers is bad design. Having a static list of references to other browser windows stops the GC from freeing up memory causing the leak. The leak I am referring to is the one you have reproduced here: https://github.com/SimpleBrowserDotNet/SimpleBrowser/issues/121 That is why I implemented an IDisposable pattern to make sure the cleanup is done automatically for you when your browser object goes out of scope (or due to an exception).

The implementation is not breaking the old behavior, you can just ignore the Disposable pattern and also ignore there is a Close method.
The Async pattern is neither a breaking change, you can still call the old non-async methods however you will get a build warning saying these methods are obsolete.

BR Mikael

kevingy commented 10 months ago

This PR has been around forever. I'm going to start pulling pieces of it in at a time, then ultimately deny this pull request in favor of the individual pieces.

mikaelliljedahl commented 3 months ago

This PR has been around forever. I'm going to start pulling pieces of it in at a time, then ultimately deny this pull request in favor of the individual pieces.

How's it going pulling the pieces?