Closed Lyrete closed 9 months ago
This looks awesome!
Re. your https://github.com/ClimateTown/knowledge-hub/issues/219#issuecomment-1872573317
What was the idea with the naming schema? For an initial thing I implemented a 8 char hex digest off the resource name...
Using the URL instead of resource name would ensure no name collisions. You can use a hex digest of that, or make the URL filename safe (the former probably a better option to avoid filename length issues).
Otherwise this is fairly done so it's a question of whether you'd prefer two PRs if I do #218 too.
I think two separate PRs would be great and be a bit easier to review (especially since I'm less familiar with async stuff so want better visibility on that). I'm happy to review this ASAP so that your workflow isn't interrupted, feel free to mark as "ready for review" when you're ready :) .
Using the URL instead of resource name would ensure no name collisions. You can use a hex digest of that, or make the URL filename safe (the former probably a better option to avoid filename length issues).
Was actually already doing that while you posted your comment :D
I'll do a bit of clean up and then should be good to go :) There's a good chance I have to touch this functionality anyway while doing the parallelization, so better not to go too deep with the optimizing.
so better not to go too deep with the optimizing.
That's fair. I think rather than rescaling then, let's just make sure that we properly do the conversion to webp
and jpeg
formats. This is quick convo I had with ChatGPT that may help with development. Looks like a quick call to Pillow.
Rescaling can be left to another PR
Implemented the webp thing so should be fairly good to go now. Fairly big bump (30-40s) to the build time but I guess that should get optimized in the next one.
Oh wait the Pillow dep isn't in there yet, oops.
There's some github actions issue btw, where the build workflow happens if a precommit was made on github on a PR but not if it wasn't. I don't know which is your intention.
# TODO: Maybe add additional behaviour to check if the image is large enough
I've removed this TODO comment. I think its a very low priority issue, and would be the 'fault' of the website we're scraping from. If we add image rescaling/optimisation, this can be rolled into that I guess.
There's some github actions issue btw
I think thats due to the permissions and me having to authorize workflows. I don't think its an issue
I think 22798d3 was breaking 😅 At least in the VScode Markdown preview. Reverting for wider compatability
@all-contributors please add @Lyrete for code
@VeckoTheGecko
I've put up a pull request to add @Lyrete! :tada:
I think 22798d3 was breaking 😅 At least in the VScode Markdown preview. Reverting for wider compatability
Interestingz for me the emoji was breaking 🤔 the link in the radme doesn't work with it.
Ah, didn't think to check the README on GitHub 😅. I'll revert the reversion 😁
Describe your changes
static/previews
.ListItem.svelte
to use the base url in front of the og_preview so they point correctly.New dependencies
httpx
as a dependency, will need it for parallelization and good usage already here since it's a good idea to mask the user-agent to not get blocked (as often) when scraping the images.pip-tools
version to 6.13.0 as it seems 6.12 i broken with recent versions of pip: https://github.com/jazzband/pip-tools/issues/1854Related issue number/link
Fixes #219 Will probably help with #201 as well