Closed ardnor closed 6 years ago
Yeah, those frameworks and Javascript only websites are basically broken by design. Heavy Javascript sites make scraping tasks difficult but not necessarily impossible. When I'm scraping such websites, I watch for network requests first in a regular web browser - hoping they serve up the content I want via either JSON or JSONP - and then attempt to replicate the requests using this toolkit. That bypasses the need to process HTML or Javascript in the first place.
If you've got an example page, I can take a look. I probably need to update the documentation to better address JS only sites especially since Angular and React sites are infrequently showing up on my radar.
Hello @cubiclesoft , Thanks for this reply yes it was really difficult if the web page is not a static HTML. Here is an example page: https://www.checkmeout.ph/track/1284-1726-LHEB that is a tracking of my store and I would like to scrape the status because that platform doesn't provide an API for my information I need. Hope you will find a workaround on this.
Thank you.
I've been meaning to make a video for a while to demonstrate the various web scraping techniques I use and add it to the main documentation in this repository. This should answer your question:
This is the exact problem I'm having. Your linked YouTube video got me closer to my goal but I'm still not quite there. Here is the full URL I'm trying to scrape from https://listentotaxman.com/?year=2024&taxregion=uk&age=0&time=1&ingr=55000 you see when you visit that link that after the page finishes loading the table on the right is populated with numbers; however when viewing the source they are all still 0. Looking at the Network tab on Developer Tools I was able to find my request and the response in a file called index.js.php and in it is the data I want. It looks like JSON to me and the data item I need is called net_pay but I so far haven't been able to extract this. Please can you help?
@Mitch415 The request to the server probably needs to be a POST request with 'Content-Type: application/json'. The body of the content needs to be a JSON object.
The documentation already covers this: https://github.com/cubiclesoft/ultimate-web-scraper#sending-non-standard-requests
Thankyou, I appreciate this. What I did in the meantime was look at the js on that site and found I could post directly to it and read the response with a standard Ajax XMLHttpRequestOn 6 Jul 2024, at 09:06, CubicleSoft @.***> wrote: @Mitch415 The request to the server probably needs to be a POST request with 'Content-Type: application/json'. The body of the content needs to be a JSON object. The documentation already covers this: https://github.com/cubiclesoft/ultimate-web-scraper#sending-non-standard-requests
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>
Hello, how to get the dynamic page which uses angular and another related javascript frameworks.
Thanks,
Ronard