crubba / htmltab

An R package for assembling data frames from HTML tables (fka htmltable)
Other
26 stars 7 forks source link

Inconsistent behaviour when passing header and body arguments #8

Open JSB97 opened 9 years ago

JSB97 commented 9 years ago

https://www.dropbox.com/home/Public?preview=6501.Yuho.pdf_table_30.html

Same case, different issue. There seems to be inconsistent behaviour when trying to pass both header and body arguments. Using both fails to return the 1st column and all headers. Just using body works, but it would ideal to just pass the header argument as one will not always know in advance what row to parse data from.

When passing both the header and body xpaths

u4 <- "/SomePath/6501.Yuho.pdf_table_30.html" t4 <- htmltab(doc = u4,header="/html/body/table/thead",body = "//tr[position() > 2]") Warning messages: 1: Argument 'which' left unspecified. Choosing first table. 2: No header generated. Try passing information to header or colNames. Header XPath was /html/body/table/thead

head(t4) V2 V3 V4 V5 V6 V7 V8 1 86,975 70,265 32,050 (1,686) 16,751 3,951 209,992 76,534 2 24,701 14,795 84,814 (8,468) 7,419 14,105 145,834 10,164 3 97,340 58,002 30,259 (3,739) 9,512 8,274 203,387 48,779 4 62,131 35,556 41,975 (2,093) 597 2,673 142,932 24,617 5 95,331 122,966 59,537 (9,497) 3,137 14,586 295,557 19,570 6 132,260 173,340 93,394 (13,288) 2,437 17,860 419,291 43,059

When passing just the body; correct output is returned.

t4 <- htmltab(doc = u4,body = "//tr[position() > 2]") Warning message: Argument 'which' left unspecified. Choosing first table. head(t4) セグメントの名称 帳 簿 価 額 (百万円) >> 建物及び 構築物 帳 簿 価 額 (百万円) >> 機械装置及び 工具器具備品 1 情報・通信システム 86,975 70,265 2 電力システム 24,701 14,795 3 社会・産業システム 97,340 58,002 4 電子装置・システム 62,131 35,556 5 建設機械 95,331 122,966 6 高機能材料 132,260 173,340 帳 簿 価 額 (百万円) >> 土 地 (面積千㎡) 帳 簿 価 額 (百万円) >> リース 資産 帳 簿 価 額 (百万円) >> その他 1 32,050 (1,686) 16,751 3,951 2 84,814 (8,468) 7,419 14,105 3 30,259 (3,739) 9,512 8,274 4 41,975 (2,093) 597 2,673 5 59,537 (9,497) 3,137 14,586 6 93,394 (13,288) 2,437 17,860 帳 簿 価 額 (百万円) >> 合 計 従業員数 (人) 1 209,992 76,534 2 145,834 10,164 3 203,387 48,779 4 142,932 24,617 5 295,557 19,570 6 419,291 43,059