baudehlo / node-phantom-simple

Simple bridge to phantomjs for Node
MIT License
201 stars 70 forks source link

Detecing port used by bridge server is too complicated #9

Closed potmo closed 9 years ago

potmo commented 11 years ago

Right now there is a couple of lines that try to find what port the webserver uses to be able to send messages to it. It starts on line 94. The problem is that it doesn't work correctly and relies on system specific calls external programs. See @Fresa s pull request for a go at trying to fix some of the issues

I think, contrary to what the node-phantom-simple documentation says, that the webserver module used is not one built into node but the one built in to phantomjs.

If I understand the phantomjs docs for webserver there should be a port field that is exposed on the webserver object.

port : the port on which the server listen requests (readonly)

That values should be possible to send via stdout from the bridge to the node-phantom-simple module running in the node scope just like the bridge already does to detect if phantomjs is ready (see line 78 ).

Does all this make sense? Have I missed somethign? Otherwise I'll have a go at this tomorrow.

baudehlo commented 11 years ago

It makes sense and that's the way I tried to build it first. I couldn't get it to work. If you can then great. My experience was it doesn't give you the real port used.

On Sep 29, 2013, at 3:50 PM, Nisse Bergman notifications@github.com wrote:

Right now there is a couple of lines that try to find what port the webserver uses to be able to send messages to it. It starts on line 94. The problem is that it doesn't work correctly and relies on system specific calls external programs. See @Fresa s pull request for a go at trying to fix some of the issues

I think, contrary to what the node-phantom-simple documentation says, that the webserver module used is not one built into node but the one built in to phantomjs.

If I understand the phantomjs docs for webserver there should be a port field that is exposed on the webserver object.

port : the port on which the server listen requests (readonly)

That values should be possible to send via stdout from the bridge to the node-phantom-simple module running in the node scope just like the bridge already does to detect if phantomjs is ready (see line 78 ).

Does all this make sense? Have I missed somethign? Otherwise I'll have a go at this tomorrow.

— Reply to this email directly or view it on GitHub.

Emanuele-Fortunati commented 10 years ago

I was wondering - but not sure it will work using cluster - why don't using an arg to tell phantom which port to listen to? Then you could expose that as an option or pick a random value if user doesn't pick one. I know there may be the risk the chosen port may be already in use, but I think with a good error message user can pick another one? On the end of the day if you are using phantomjs with nodejs you know what you are doing (I hope!). Just an idea anyway...

baudehlo commented 10 years ago

The chance the port is in use is just too high. Remember this was developed for a high end crawling system, with lots of ram and CPU, and yes using cluster. And there are only 64k ports to choose from. It sounds like a lot when you're running 1 process but when you're running hundreds or thousands it runs out fast.

The correct solution is to get phantom to report the actual port used when you ask it to listen on port zero. Currently it correctly assigns a random port but then still returns zero when you query the port.

On Aug 14, 2014, at 8:16 AM, Emanuele Fortunati notifications@github.com wrote:

I was wondering - but not sure it will work using cluster - why don't using an arg to tell phantom which port to listen to? Then you could expose that as an option or pick a random value if user doesn't pick one. I know there may be the risk the chosen port may be already in use, but I think with a good error message user can pick another one? On the end of the day if you are using phantomjs with nodejs you know what you are doing (I hope!). Just an idea anyway...

— Reply to this email directly or view it on GitHub.

Emanuele-Fortunati commented 10 years ago

Thanks for the explanation, it makes perfectly sense now. I'll keep this in mind as I'm working on something that has similar issue, but I may do the follow:

1- server.listen(0, '127.0.0.1') on my node.js script 2- get server port and close server 3- use that port as an argument to send to bridge.js

I know there's a small chance that during the time I close the server and I spawn phantom the port may get busy, but with a fallback that try another port all should be working. I may also offer the option of using a specific port as on some architectures developers and IT managers want to have that choice.

Anyway, really thanks for your work to put the module together, I use it quite a lot!

[btw if u think this could be a good solution for your module too I can pull request it, or just feel free to update the code yourself, I'm not looking for notoriety ;)]

baudehlo commented 10 years ago

Yeah the race condition is something that you WILL run into with this scheme if it's in heavy use.

Why do you need to change the scheme that the module uses?

On Thu, Aug 14, 2014 at 9:19 AM, Emanuele Fortunati < notifications@github.com> wrote:

Thanks for the explanation, it makes perfectly sense now. I'll keep this in mind as I'm working on something that has similar issue, but I may do the follow:

1- server.listen(0, '127.0.0.1') on my node.js script 2- get server port and close server 3- use that port as an argument to send to bridge.js

I know there's a small chance that during the time I close the server and I spawn phantom the port may get busy, but with a fallback that try another port all should be working. I may also offer the option of using a specific port as on some architectures developers and IT managers want to have that choice.

Anyway, really thanks for your work to put the module together, I use it quite a lot!

— Reply to this email directly or view it on GitHub https://github.com/baudehlo/node-phantom-simple/issues/9#issuecomment-52181651 .

Emanuele-Fortunati commented 10 years ago

Just because I want the "freedom" (= IT policy) to chose a port ;)

Emanuele-Fortunati commented 10 years ago

Originally I also thought this way was easier and less system-dependent, but your explanation changed that view

henrjk commented 9 years ago

I looked into implementing what was proposed in comment 1 also. However phantomjs currently does not provide an API to determine the selected port after it has been bound. See phantomjs issue 10640 for more details.

phantomjs uses the mongoose web server but it no longer picks up newer versions due to license issues. This may mean that the issue mentioned above will not be fixed in phantomjs anytime soon.

A grunt build which uses uncss with the shippable build service failed for me. See shippable issue 884 for more information.

The key problem seen on the build is that on system with certain security policies in place netstat of lsof will not reveal pid's which makes the current approach fail.

The build use case I am after would be served well by allowing to optionally provide a port on which bridge.js would listen if specified. If a pull request is welcome I would be happy to take a stab at this.

puzrin commented 9 years ago

I think we can close this ticket. Yes, that peace of code is dirty, but it's the only way to workaround phantomjs bug properly in multiprocess environment.

If you look for SlimerJS patch, accepted to master, you will see that appropriate part for slimer is more simple, because it returns correct port value. As a side effect - after PhantomJS fix webserver.port value, this driver will stop use hacks automatically.

Does anyone has reasons why this ticket should stay open?

baudehlo commented 9 years ago

How does slimerjs get the right port, just for my curiosity ? Or has this now been fixed in phantom?

On Jul 7, 2015, at 8:36 PM, Vitaly Puzrin notifications@github.com wrote:

I think we can close this ticket. Yes, that peace of code is dirty, but it's the only way to workaround phantomjs bug properly in multiprocess environment.

If you look for SlimerJS patch, accepted to master, you will see that appropriate part for slimer is more simple, because it returns correct port value. As side effect - after PhantomJS fix webserver.port value, this driver will stop use hacks automatically.

Does anyone has reasons why this ticket should stay open?

— Reply to this email directly or view it on GitHub.

puzrin commented 9 years ago

See https://github.com/baudehlo/node-phantom-simple/commit/03f758a81f6e4589cd45c8faadd7586aaa32ad98#diff-b590f93db9bb7894e89e0b023a761e3fR189 . You can read webserver.port, and it returns correct value (the real port after listen to :0).

That's NOT fixed in phantom < 2.0, and we did not tested 2.0 (because it's beta/buggy/not-released-in-npm-yet).

baudehlo commented 9 years ago

No I get that, I just mean that phantom doesn't provide the right port, so how can you get it?

On Jul 7, 2015, at 8:58 PM, Vitaly Puzrin notifications@github.com wrote:

See 03f758a#diff-b590f93db9bb7894e89e0b023a761e3fR189 . You can read webserver.port, and it returns correct value (the real port after listen to :0).

That's NOT fixed in phantom < 2.0, and we did not tested 2.0 (because it's beta/buggy/not-released-in-npm-yet).

— Reply to this email directly or view it on GitHub.

puzrin commented 9 years ago

They have completely different implementations. Slimer use wrapper for built-in library https://github.com/laurentj/slimerjs/blob/master/src/modules/webserver.jsm#L78 .

Phantom use something custom:

At first glance, problem should exist in 2.0 too.

puzrin commented 9 years ago

Closing. Kick me if here is something important that should be kept open.