az0 / linkgopher

Firefox/Google Chrome add-on: Extracts all links from web page, sorts them, removes duplicates, and displays them in a new tab for inspection or copy and paste into other systems.
GNU General Public License v3.0
280 stars 61 forks source link

Only a few links extracted #56

Closed gituzzer closed 1 year ago

gituzzer commented 2 years ago

Using linkgopher v2.4.4 on Firefox 104.0.2 (on Chrome 105, same happens)

I found this behavior while browsing a site

https://www.instagram.com/madonna/reels/

or any instagram site with Reels, not Madonna specific :-) . Reproducible.

Site creates hundreds of media reels/links while going page down.

But whenever I "Extract all links", only a small random number (typically 20-40) of reel links are extracted:

https://www.instagram.com/reel/CS15qtanjeO/ https://www.instagram.com/reel/CS4zBMjCJPu/ https://www.instagram.com/reel/CSt_27OievG/ https://www.instagram.com/reel/CSwtj_FiD-y/ https://www.instagram.com/reel/CT-mKMigP2G/ https://www.instagram.com/reel/CT2qVhbA2vi/ https://www.instagram.com/reel/CT5dC9pAZRA/ https://www.instagram.com/reel/CTAciy1C6oB/ https://www.instagram.com/reel/CTDKs1En35L/ https://www.instagram.com/reel/CTIfj8-C-Su/

The behavior in other sites with multiple media links (e.g. Tiktok, etc) I checkd is OK.

Is it instagram fooling around or an undiscovered bug ?

Thank you and keep up the Great Work.

az0 commented 1 year ago

Is it instagram fooling around or an undiscovered bug ?

tl;dr Yes, fooling around.

Link Gopher uses normal methods to view the current data for the web page, and Instagram is dynamically adding and removing links.

If you know how to use Inspect, you can watch it happen in real time. Right click on one of her pictures, choose inspect, and it brings up the elements. Navigate to a link that looks like this. Then scroll.

<a class="x1i10hfl xjbqb8w x6umtig x1b1mbwd xaqea5y xav7gou x9f619 x1ypdohk xt0psk2 xe8uvvx xdj266r x11i5rnm xat24cr x1mh8g0r xexx8yu x4uap5 x18d9i69 xkhd6sd x16tdsg8 x1hl2dhg xggy1nq x1a2a7pz _a6hd" href="/reel/CwF4e3HsnTg/" role="link" tabindex="0"><

Here's a video. Notice there is a list of <div> elements. They are changing during scrolling, but the count of elements stays the same.

instagram_scroll.webm