gildas-lormeau / SingleFile

Web Extension for saving a faithful copy of a complete web page in a single HTML file
GNU Affero General Public License v3.0
15.71k stars 1.02k forks source link

Lazy loading issues with embed tweets on Medium #1176

Open bansan85 opened 1 year ago

bansan85 commented 1 year ago

Describe the bug Twitter and Github embedded widget are not saved when deferred.

To Reproduce Steps to reproduce the behavior:

  1. Go to https://medium.com/androidmood/comprendre-larchitecture-mvvm-sur-android-aa285e4fe9dd
  2. Don't scroll
  3. Download page with SingleFile
  4. Missing twitter and github embedded widget

If I scroll the whole page before save, nothing is missing.

I also tried to tick the "scroll" and "zooming out the page" option without success.

But it's strange that "zooming out the page" doesn't work because I can clearly see that Twitter and Github embedded widgets are loading during this step. So I need to download twice the page. The first time to load data in the browser, the second time to really save it on the html page generated by SingleFile.

Screenshots

Original page (good) image

image

Singlefile page (bad)

image

image

Environment

gildas-lormeau commented 1 year ago

Medium implements a complicated mechanism to lazy load contents. Unfortunately, you have to scroll to save the page properly.

gildas-lormeau commented 1 year ago

I was able to (almost) fix the issue. Now SingleFile can save all the frames (without scrolling) except the embed tweet, unfortunately. I also added an option to ensure deferred frames are loaded before being saved. The fix and the new option will be available in the next version.

I keep the issue open because of the embed tweet bug.

gildas-lormeau commented 1 year ago

For the record, enabling Images > save deferred images > zoom out the page fixes the issue with the embed tweet.

bansan85 commented 1 year ago

Thanks. I confirm that enabling "load deferred frames" option solves the problem about the missing image. Great job.

gildas-lormeau commented 1 year ago

Thank you for the feedback. Actually, you don't even need to enable the option but it's safer to do so.

bansan85 commented 1 year ago

I didn't tested without. But now I have.

I confirm : I need to have "load deferred frames" to have Twitter AND Github frames.

Without "load deferred frames", I have none of them.

gildas-lormeau commented 1 year ago

Thank you for the information. I was referring to Medium actually, I didn't really test the fix/new option on other websites.

bansan85 commented 1 year ago

Me too, with the link from the first message... Strange...

Edit : Strange again. A just made a new test, I lost the tweet frame but still have the Github frame...

Edit 2 : Tweet frame came back and doesn't disappear anymore ⁉️

gildas-lormeau commented 1 year ago

Sorry for the misunderstanding regarding the website you're testing. Medium sucks, a lot. They use over engineered implementations to handle lazy loaded contents. Moreover, they don't use the same technique to handle the lazy-loading on GitHub and Twitter frames. That's probably why you see strange results. Note that lazy-loaded content management remains in best-effort mode in SingleFile. Unfortunately, I can't guarantee that it will work in 100% of cases.