TidierOrg / TidierVest.jl

Tidier web scraping in Julia, modeled after the rvest R package.
MIT License
30 stars 3 forks source link

Update html_table.jl, added type to table_html #6

Closed SaintRod closed 8 months ago

SaintRod commented 9 months ago

If a webpage has multiple tables, users using Cascadia Selectors, such as Selector("table") will return an n-element Vector{HTMLNode}. The problem is the current implementation doesn't allow to loop through or broadcast the html_table function because the elements of the Vector{HTMLNode} are of type HTMLElement{:table}, so the function html_table function errors out as it expects type Vector{HTMLNode} not HTMLElement{:table}.

The proposed solution would alter the function by appending via Union the HTMLElement{:table} type to the table_html parameter. This allows users to broadcast the html_table function or loop through the elements of the tables vector and the html_table will return a vector of n dataframes.

using TidierVest
response = TidierVest.read_html("https://en.wikipedia.org/wiki/Houston")

# returns 23-element Vector{HTMLNode}
tables = TidierVest.html_elements(response, "table")

# returns a 23-element Vector of dataframes
broadcast(TidierVest.html_table, tables)

Edits:

kdpsingh commented 9 months ago

Thank you for this. I'll give @jdiaz97 a chance to review this PR. I realize it's the holidays, so we can wait until after the new year. If @jdiaz97 you are too busy to review, let me know and I can take a look after the new year.

Have a wonderful holiday, everyone.

kdpsingh commented 8 months ago

Excellent. Should I bump the version and cut a new release @jdiaz97?

Or would you like to do it?

Or would you like to wait?

jdiaz97 commented 7 months ago

I'll push a new version this weekend @kdpsingh

kdpsingh commented 7 months ago

@jdiaz97, we should set up the TagBot GitHub action which makes it super easy to generate releases simply by commenting on the commit. Let us know, we would be happy to help set this up.

jdiaz97 commented 7 months ago

@kdpsingh I'd really appreciate that!