Closed pushkarbh closed 4 weeks ago
Does this still occur with 12.0 (we just updated PSI API)?
I just tried a few times and it seems to work for me. It may be an intermittent error.
I just tried using the endpoint we've been using "https://www.googleapis.com/pagespeedonline/v5/runPagespeed" and getting the same error still.
Here is the curl command - curl --location 'https://www.googleapis.com/pagespeedonline/v5/runPagespeed?key=<API-KEY>&url=https%3A%2F%2Fwww.realtor.com%2Frealestateandhomes-search%2FChicago_IL&strategy=mobile'
How do I test this api with 12.0? Using v12
as opposed to v5
gives a 404 error.
Thanks. I'll look further tomorrow.
How do I test this api with 12.0? Using v12 as opposed to v5 gives a 404 error.
You already are. There's only one PSI version (v5), but we update the LH version there (which is now 12).
Hopefully you're able to reproduce the issue. Let me know if not. Thanks!
I overlooked the 403 in your error message. I get the same locally when using the API, and also via plain usage of curl:
curl https://www.realtor.com/realestateandhomes-search/Chicago_IL -I
Seems your webserver is blocking UAs that indicate curl was used (or rather, that a web browser is not being used), which would explain failures of the API from programmatic usage.
The 403 error is coming from a machine in google making requests to your webserver, which IIUC should be the same via curl kicking off the API request or the webserver doing it.... so actually I'm really unsure why this could be happening. @paulirish mentions perhaps X-Forwarded-For
is what varies, is your server perhaps checking that or any request headers and blocking access to some bots?
I tried curl https://www.realtor.com
and it returns an error page with This page requires JavaScript!
mentioned in the html response.
I don't work for realtor.com
, so I won't be able to find out what has changed. But it seems like they've recently added some defense to non-browser accesses. This used to work, so must be a recent change.
Is there anyway to make this work by sending any custom headers to the PSI api? Thanks for looking into this.
I think the options of using PSI for the mentioned domain are limited given the bot control mechanism put in place. Can the CrUX API or CrUX History API be used to fetch the aggregated data from BigQuery without reaching the origin url?
We have some planned changes to the PSI api that preclude spending time on it now to still get the CruX parts of the API even if the Lighthouse part fails. For now, any error in the Lighthouse part will fail the entire request.
Is what you're looking for not part of these APIs? https://developer.chrome.com/docs/crux/methodology/tools#tool-crux-api or https://developer.chrome.com/docs/crux/methodology/tools#tool-crux-history-api
It would be great to have PSI api to return CrUX part despite Lighthouse failures. Do you have a rough idea when these changes may be available? Is it like 1-2 quarters or longer?
For now I'm going to see if we can use the CrUX or CrUX History api. Thanks!
It would be great to have PSI api to return CrUX part despite Lighthouse failures. Do you have a rough idea when these changes may be available? Is it like 1-2 quarters or longer?
Unlikely.
For now I'm going to see if we can use the CrUX or CrUX History api. Thanks!
Good plan. :)
FAQ
URL
https://www.realtor.com/realestateandhomes-search/Chicago_IL
What happened?
The url https://www.realtor.com/realestateandhomes-search/Chicago_IL and some other valid urls from the same domain have started failing in the PSI API calls. We used PSI API for these urls for long time successfully but seeing these errors for past couple of weeks. Here is the error:
[Lighthouse returned error: ERRORED_DOCUMENT_REQUEST. Lighthouse was unable to reliably load the page you requested. Make sure you are testing the correct URL and that the server is properly responding to all requests. (Status code: 403)]
All these failing urls continue to work on https://pagespeed.web.dev. I checked bug reports for similar error but most of those are for lighthouse as opposed to PSI API. I see some possible causes listed in https://github.com/GoogleChrome/lighthouse/issues/2784, but curious why the same urls work successfully on the PSI site. We run the API from a Python script but same error can be reproduced by running the API on Postman as well.
Please suggest what can be done to resolve this.
What did you expect?
As mentioned earlier, these urls worked till couple weeks ago. We expect it to give us web vital data using field and lab metrics very similar to what we can see even now on https://pagespeed.web.dev.
What have you tried?
Tested different urls and validated on https://pagespeed.web.dev. Other urls from different sites we use in our test suite continue to work. Just the urls from this domain stopped working recently.
How were you running Lighthouse?
PageSpeed Insights, Other
Lighthouse Version
11.5.0
Chrome Version
119.0.0.0
Node Version
No response
OS
Linux & Mac
Relevant log output