Closed GuilloOme closed 7 years ago
I opened a issue here: https://github.com/ariya/phantomjs/issues/14782
Update: I closed the phantomJS issue today (2016-04-03). After some research, it is an expected behavior from phantomJS/webkit. Since all the events are triggered simultaneously, phantomJS (through webkit) only navigate to the last location requested (see more detail in the phantomjs issue), then in htcap, only the last one is trapped.
Here is where I am now for the resolution of this issue:
Root cause of the issue
Because all the events found in the DOM are triggered back-to-back without waiting the end of the current stack to resolve, only the last window.location
is visible through the onNavigationRequested
called by phantomJS.
Solution found Make sure that none of the events are triggered within the same stack: this way, avoiding collision and inconsistency in the page context (the events we trigger this way are meant to be triggered by a human user so the page script could not have been engineered to handle this kind of concurrency). Technically, it means using a messageEvent queue to place it one by one in the event loop.
Difficulties
The actual implementation of the JS probe is not designed to handle async logic: the usage of setInterval
as a loop and global flags for managing the state of the analyzeDOM
implie skipping step in the process… for more info: Concurrency model and Event Loop and this very good talk
It is possible that it will be necessary to refactor the current analyzeDOM()
method.
Site relying on
window.location
for navigation are not crawled properly.A crawl on a page like this one will not see all the site: