Closed fishy closed 3 years ago
Example URL: https://sanjosespotlight.com/san-jose-legends-rod-diridon-launched-the-citys-light-rail-but-got-into-transit-by-accident/
This is actually an interesting case. The site doesn't use AMP, and their images look like this:
<img loading="lazy" width="1024" height="683" alt="" data-src="https://sanjosespotlight.s3.us-east-2.amazonaws.com/wp-content/uploads/2020/12/26233502/Rod-Diridon-3.jpeg-1024x683.jpg" class="size-large wp-image-66602 lazyload" src="data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw=="><noscript><img loading="lazy" src="https://sanjosespotlight.s3.us-east-2.amazonaws.com/wp-content/uploads/2020/12/26233502/Rod-Diridon-3.jpeg-1024x683.jpg" class="size-large wp-image-66602" width="1024" height="683" alt=""></noscript>
So basically they try to put a placeholder image there, lazy loading the actual image async.
I think a potential solution is to try to get the noscript -> img tag inside img tag if it's there.
noscript
img
OK since they doesn't close the outer img tag, go's html parser doesn't treat the noscript tag as the children of the outer img tag, so this approach doesn't work. Closing as won't fix.
Example URL: https://sanjosespotlight.com/san-jose-legends-rod-diridon-launched-the-citys-light-rail-but-got-into-transit-by-accident/
This is actually an interesting case. The site doesn't use AMP, and their images look like this:
So basically they try to put a placeholder image there, lazy loading the actual image async.
I think a potential solution is to try to get the
noscript
->img
tag insideimg
tag if it's there.