Open clkao opened 11 years ago
we'll use npl key: http://npl.ly.gov.tw/do/www/commissioner?act=exp&blockId=2
@thewayiam is processing
now we have pk as: https://github.com/thewayiam/twly_crawler/blob/master/npl_ly(pretty_format).json
will combine data with http://www.ly.gov.tw/03_leg/0301_main/legList.action later
全球資訊網與國會圖書館兩系統資料衝突如下: https://docs.google.com/spreadsheet/ccc?key=0Am6gVfTCSAPLdC0wLXRSYmhqT1JFMkMydGdfd3Vxd3c#gid=0 備註中寫寄信處理的如無其他意見我將分別寄信給兩系統,內部處理的問題較小,是否我們自行決定採用那一系統為準
2013/11/28 已回報npl@ly.gov.tw
final merged json, one dict for one id, every ad are included. field which npl didn't has(ex: education, contacts, term_end...) would take from ly.gov.tw, field which npl has(ex: party, committees, experience...) would take from npl. one serious source data bug as mensioned above, id=949(蔣孝嚴 章孝嚴) are duplicated, wait npl reply. data: https://github.com/thewayiam/twly_crawler/blob/master/merged(pretty_format).json
oh fun, that's an actual and valid rename that happened. popolo spec has a field for former names
interesting!! already add former_names key for rename case, value will be list. go ahead and laugh for my poor common sense.
there's an example here with other_names
that includes date range: http://popoloproject.com/specs/person.html
I haven't get a proper data source for date range(birthday...), so I keep the name on every ad inside for temporary use, ex: id=949, name="蔣孝嚴", former_names=["章孝嚴"] each_term:[ {ad:5, name="章孝嚴", ...}, {ad:6, name="蔣孝嚴", ...}, ... ]
btw, npl has reply on 2013-12-02: 台端指正本館網站「歷屆委員」及「立法院全球資訊網」委員部分資料有異,本館業已查核修正,至於「立法院全球資訊網」部分刻正轉請相關單位處理中。 國會圖書館網站謹上
@thewayiam++
12:08 < clkao> pofeng: 我在想用「第一次當立委的屆+座號」當 primary key 12:09 < pofeng> clkao: 第一次(?) ... 可不可暫時先用第幾屆 12:09 < clkao> 這樣他當兩屆就有兩個 id ? 12:09 < clkao> 所以才說第一次的
basically the stage (term) + lgno in https://github.com/ronnywang/TWLegislativeYuanData/blob/09c259cfe44b7e4ac7629ed834b2a11bb82d8aa7/8.json
however we should use the first term an individual is appointed plus the lgno to avoid duplication