apify-projects / store-gpt-scraper

Extract data from any website and feed it into GPT via the OpenAI API. Use ChatGPT to proofread content, analyze sentiment, summarize reviews, extract contact details, and much more.
https://apify.com/drobnikj/gpt-scraper
0 stars 0 forks source link

fix: remove inline embedded base64 encoded images in page processing #86

Open Patai5 opened 6 days ago

Patai5 commented 6 days ago

https://console.apify.com/actors/paOtbjvyUiNsr1Qms/issues/NTDlDdHWOQYBX5Jtd


We should remove the inline base64 encoded images, that are being used on some websites, in page processing before being sent to the GPT. It's useless and wastes a lot of tokens.

PavlinaVencovska commented 1 day ago

@JJetmar Can we please have more info on how to priorize this?

JJetmar commented 1 day ago

@PavlinaVencovska Well this makes the scraper unusable when website contain images with such a sources. There might be a simple workaround (not tested) by setting selector img[src^="data:"] for elements to re removed though.