baumer-lab / wikitablr

Simple Reader for Wikipedia Tables
Other
0 stars 0 forks source link

Problem with 5th table on Marvel page #9

Open beanumber opened 4 years ago

beanumber commented 4 years ago
library(tidyverse)
library(wikitablr)
read_wikinodes("https://en.wikipedia.org/wiki/List_of_Marvel_Cinematic_Universe_films") %>%
pluck(5) %>%
rvest::html_table()
#> Warning in lapply(ncols, as.integer): NAs introduced by coercion
#> Error in if (length(p) > 1 & maxp * n != sum(unlist(nrows)) & maxp * n != : missing value where TRUE/FALSE needed

Created on 2020-05-13 by the reprex package (v0.3.0)

Not sure what the problem is, but it's a bug in rvest::html_table().

rporta23 commented 4 years ago

The marvel page in general hasn't worked well with this package in my experience.

beanumber commented 4 years ago

The problem is:

html_table currently makes a few assumptions:

No cells span multiple rows

But this table has cells that span multiple rows.

Can you think of a way for this to fail more gracefully? Maybe add a tryCatch() block?