Closed GuilloOme closed 7 years ago
This PR already integrate #24
Hi, your refactoring of probe.js can make a lot of sense but it's currently not working. I use this page to test the recursion http://htcap.org/scanme/ng/ and your algorithm is actually missing a lot of requests and adding useless and duplicated records to the database. However it may be a good starting point. Thanks!
I did actually had some doubt about the recursion. I tested it against wivet but it didn't have any recursion test case. Your testbench seems quite complete! Can I use it for testing the crawler? Do the source of these available somewhere? (so I could run it locally) Thank you for your time!
sure you can. If you need to run it locally you can get it with wget -r
Thank you for your inputs, I'll fix that and come back with a new PR.
fix #22, there is a lot of change here, it's some pretty advanced low level javascript (as low as javascript can go ;) ). if you need more information about it, ask me!
What have been done
remove all the logic around the wait/sync of event (based on timing) and replace it with a logic based on the eventLoop cycle (more here: Concurrency model and Event Loop and this very good talk ). Everything happening on the analysed page is now place in a queue and treated when the stack is ready to take more.
use the XHR event API from the browser to get the most up to date status on a request (no more sync wait)
use the mutation observer API from the browser to get any change on the DOM (way faster and precise than storing a array of element)
all the probe is now totally async
clean up a lot of code ;)
Benefits
Drawback