Closed tmtmtmtm closed 9 years ago
I'm looking at this.
Scraper at https://morph.io/struan/ecuador_national_assembly_members
Missing gender for a few people and images for one.
The party list doesn't seem to quite match up with the wikipedia data but given the wikipedia data doesn't match up with itself I'm not going to worry too much as they all have "this page is outdated" at the top.
Thanks @struan
I'm not sure if this is just left-over data from multiple-runs that you didn't clean out, but there are three different entries for Blanca Azucena Arguello Troya: one with an id of 145, one with an empty ID, and one with an ID of sites/all/modules/an_asambleistas/img/varias/mystery
You should also add a source
line for the page for each person: it looks like the 'blog' link in the 'contacts' cell would be best for that, as it seems to be their official page.
It would be good if you could trim the leading space from the Area names too. (Most of my scrapers add a .tidy
method onto String that collapses all whitespace (including ) and then removes all leading and trailing spaces — I use it pretty much everywhere)
So it seems there's something screwy with the data and they have two entries for Blanca Azucena Arguello Troya although one has more or less no data and the data it does have is identical to the other record. If I skip over that then it turns out the site only has data for 136 members but there are supposed to be 137. I can't work out if this is because they are short a member at the moment or if the bad data above is messing things up :(
http://www.asambleanacional.gob.ec/es/pleno-asambleistas