Open KaMyKaSii opened 2 years ago
Looks like pup
won't let you access the wire:initial-data
attribute directly (which seems like a bug to me, will probably create an issue later) but you can work around that with the json{}
output and jq
(or other JSON processor, I guess?)
cat 45480728909.html | \
pup -p 'div[wire:initial-data] json{}' | \
jq -r '.[]|."wire:initial-data"|fromjson|.serverMemo.data.stream.stream_created_at|select(.)' | \
sort -u
Since 2022-02-10 23:09:03
is only mentioned in the wire:initial-data
attribute of various div
s, we match those, print them as JSON (using the -p
flag to convert the entities), then use jq
to do the heavy lifting of 1) getting that attribute, 2) converting it to a real object, 3) finding the stream_created_at
key (which is the only one that matches the given date), 4) removing the nulls from the list, and then using sort -u
to condense it to a unique list (which in this case is just the one date.)
(If you don't have sort
, you can do the uniquification in jq
: jq -r '[.[]|."wire:initial-data"|fromjson|.serverMemo.data.stream.stream_created_at|select(.)]|unique|.[]'
)
If PR https://github.com/ericchiang/pup/pull/175 gets pulled in, you can change the pup
part to pup -p 'div[wire:initial-data] attr{wire:initial-data}'
which will retrieve the data and simplifies the jq
bit later.
cat 45480728909.html | \
pup -p 'div[wire:initial-data] attr{wire:initial-data}' | \
jq -sr '.[]|.serverMemo.data.stream.stream_created_at|select(.)' | \
sort -u
I'm no html expert, I just want to get a string from a site to use in a shell script. What command can I use on this page to get the string "2022-02-10 23:09:03"? Any help is appreciated. Thanks.