feat: Saving image previews

ClimateTown / knowledge-hub

Your one-stop shop in the fight against climate change, containing important climate resources you wish you knew about yesterday.

https://hub.climatetownproductions.com

Creative Commons Attribution 4.0 International

36 stars 26 forks source link

feat: Saving image previews #373

Closed Lyrete closed 9 months ago

Lyrete commented 9 months ago

Describe your changes

Implemented a function to scrape the og_image from the resource, convert it to a .webp and write it to a file in static/previews.
Update ListItem.svelte to use the base url in front of the og_preview so they point correctly.
Some other minor nitpick fixes

New dependencies

Added httpx as a dependency, will need it for parallelization and good usage already here since it's a good idea to mask the user-agent to not get blocked (as often) when scraping the images.
Also upgraded pip-tools version to 6.13.0 as it seems 6.12 i broken with recent versions of pip: https://github.com/jazzband/pip-tools/issues/1854

Related issue number/link

Fixes #219 Will probably help with #201 as well

VeckoTheGecko commented 9 months ago

This looks awesome!

Re. your https://github.com/ClimateTown/knowledge-hub/issues/219#issuecomment-1872573317

What was the idea with the naming schema? For an initial thing I implemented a 8 char hex digest off the resource name...

Using the URL instead of resource name would ensure no name collisions. You can use a hex digest of that, or make the URL filename safe (the former probably a better option to avoid filename length issues).

Otherwise this is fairly done so it's a question of whether you'd prefer two PRs if I do #218 too.

I think two separate PRs would be great and be a bit easier to review (especially since I'm less familiar with async stuff so want better visibility on that). I'm happy to review this ASAP so that your workflow isn't interrupted, feel free to mark as "ready for review" when you're ready :) .

Lyrete commented 9 months ago

Using the URL instead of resource name would ensure no name collisions. You can use a hex digest of that, or make the URL filename safe (the former probably a better option to avoid filename length issues).

Was actually already doing that while you posted your comment :D

I'll do a bit of clean up and then should be good to go :) There's a good chance I have to touch this functionality anyway while doing the parallelization, so better not to go too deep with the optimizing.

VeckoTheGecko commented 9 months ago

so better not to go too deep with the optimizing.

That's fair. I think rather than rescaling then, let's just make sure that we properly do the conversion to webp and jpeg formats. This is quick convo I had with ChatGPT that may help with development. Looks like a quick call to Pillow.

Rescaling can be left to another PR

Lyrete commented 9 months ago

Implemented the webp thing so should be fairly good to go now. Fairly big bump (30-40s) to the build time but I guess that should get optimized in the next one.

Oh wait the Pillow dep isn't in there yet, oops.

Lyrete commented 9 months ago

There's some github actions issue btw, where the build workflow happens if a precommit was made on github on a PR but not if it wasn't. I don't know which is your intention.

VeckoTheGecko commented 9 months ago

# TODO: Maybe add additional behaviour to check if the image is large enough

I've removed this TODO comment. I think its a very low priority issue, and would be the 'fault' of the website we're scraping from. If we add image rescaling/optimisation, this can be rolled into that I guess.

VeckoTheGecko commented 9 months ago

There's some github actions issue btw

I think thats due to the permissions and me having to authorize workflows. I don't think its an issue

VeckoTheGecko commented 9 months ago

I think 22798d3 was breaking 😅 At least in the VScode Markdown preview. Reverting for wider compatability

VeckoTheGecko commented 9 months ago

@all-contributors please add @Lyrete for code

allcontributors[bot] commented 9 months ago

@VeckoTheGecko

I've put up a pull request to add @Lyrete! :tada:

Lyrete commented 9 months ago

I think 22798d3 was breaking 😅 At least in the VScode Markdown preview. Reverting for wider compatability

Interestingz for me the emoji was breaking 🤔 the link in the radme doesn't work with it.

VeckoTheGecko commented 9 months ago

Ah, didn't think to check the README on GitHub 😅. I'll revert the reversion 😁