NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
17.73k stars 13.85k forks source link

Packaging request: hred #165720

Closed tejing1 closed 1 year ago

tejing1 commented 2 years ago

Project description I've been searching for a tool to extract structured data usefully from web pages which don't provide a machine-readable interface such as RSS, and I recently found hred. hred extracts data from html using a language based on css selectors, but unlike other programs such as pup that do the same, it outputs its matches in (fairly configurable) json rather than as a flat list, and if that's not configurable enough, it can easily be piped into jq for more complex processing. It also apparently handles xml.

It seems better-thought-out than other tools in the category to me, and it doesn't seem to die horribly when exposed to real-world html either, like xml-oriented things often do.

I did try running node2nix in a local checkout, and the derivation it produced worked for me with no tweaking, so this is probably fairly trivial to package. I'm just not really familiar with how nixpkgs handles node stuff, so I'm not confident doing it myself.

Metadata

AndersonTorres commented 2 years ago

I will play a bit with it.