capyvara / brazil-civil-registry-data

Raw scrapings of ARPEN https://transparencia.registrocivil.org.br/
39 stars 10 forks source link

Detailed scraps taking too long, specially cities #5

Open capyvara opened 4 years ago

capyvara commented 4 years ago

Atualmente os scraps detalhados estão demorando demais, o cities_detailed tem ~151840 queries (52 cidades 365 dias 4 locais * 2 sexos), e demora múltiplos dias para concluir, teria que ficar abaixo de ~100k queries (que já seriam várias horas).

É necessário re-tentativas porque o servidor entra numa atualização/manutenção nesse meio tempo.

Opções:

Seria possível também fracionar, por exemplo deixando um mensal e um por semana epidemiológica.

Row(state='PA', state_ibge_code=15, city_ibge_code=1500800, city='Ananindeua', estimated_population=530598, is_capital=None)
Row(state='GO', state_ibge_code=52, city_ibge_code=5201405, city='Aparecida de Goiânia', estimated_population=578179, is_capital=None)
Row(state='SE', state_ibge_code=28, city_ibge_code=2800308, city='Aracaju', estimated_population=657013, is_capital=1)
Row(state='PA', state_ibge_code=15, city_ibge_code=1501402, city='Belém', estimated_population=1492745, is_capital=1)
Row(state='RJ', state_ibge_code=33, city_ibge_code=3300456, city='Belford Roxo', estimated_population=510906, is_capital=None)
Row(state='MG', state_ibge_code=31, city_ibge_code=3106200, city='Belo Horizonte', estimated_population=2512070, is_capital=1)
Row(state='RR', state_ibge_code=14, city_ibge_code=1400100, city='Boa Vista', estimated_population=399213, is_capital=1)
Row(state='DF', state_ibge_code=53, city_ibge_code=5300108, city='Brasília', estimated_population=3015268, is_capital=1)
Row(state='SP', state_ibge_code=35, city_ibge_code=3509502, city='Campinas', estimated_population=1204073, is_capital=None)
Row(state='MS', state_ibge_code=50, city_ibge_code=5002704, city='Campo Grande', estimated_population=895982, is_capital=1)
Row(state='RJ', state_ibge_code=33, city_ibge_code=3301009, city='Campos dos Goytacazes', estimated_population=507548, is_capital=None)
Row(state='RS', state_ibge_code=43, city_ibge_code=4305108, city='Caxias do Sul', estimated_population=510906, is_capital=None)
Row(state='MG', state_ibge_code=31, city_ibge_code=3118601, city='Contagem', estimated_population=663855, is_capital=None)
Row(state='MT', state_ibge_code=51, city_ibge_code=5103403, city='Cuiabá', estimated_population=612547, is_capital=1)
Row(state='PR', state_ibge_code=41, city_ibge_code=4106902, city='Curitiba', estimated_population=1933105, is_capital=1)
Row(state='RJ', state_ibge_code=33, city_ibge_code=3301702, city='Duque de Caxias', estimated_population=919596, is_capital=None)
Row(state='BA', state_ibge_code=29, city_ibge_code=2910800, city='Feira de Santana', estimated_population=614872, is_capital=None)
Row(state='SC', state_ibge_code=42, city_ibge_code=4205407, city='Florianópolis', estimated_population=500973, is_capital=1)
Row(state='CE', state_ibge_code=23, city_ibge_code=2304400, city='Fortaleza', estimated_population=2669342, is_capital=1)
Row(state='GO', state_ibge_code=52, city_ibge_code=5208707, city='Goiânia', estimated_population=1516113, is_capital=1)
Row(state='SP', state_ibge_code=35, city_ibge_code=3518800, city='Guarulhos', estimated_population=1379182, is_capital=None)
Row(state='PE', state_ibge_code=26, city_ibge_code=2607901, city='Jaboatão dos Guararapes', estimated_population=702298, is_capital=None)
Row(state='PB', state_ibge_code=25, city_ibge_code=2507507, city='João Pessoa', estimated_population=809015, is_capital=1)
Row(state='SC', state_ibge_code=42, city_ibge_code=4209102, city='Joinville', estimated_population=590466, is_capital=None)
Row(state='MG', state_ibge_code=31, city_ibge_code=3136702, city='Juiz de Fora', estimated_population=568873, is_capital=None)
Row(state='PR', state_ibge_code=41, city_ibge_code=4113700, city='Londrina', estimated_population=569733, is_capital=None)
Row(state='AP', state_ibge_code=16, city_ibge_code=1600303, city='Macapá', estimated_population=503327, is_capital=1)
Row(state='AL', state_ibge_code=27, city_ibge_code=2704302, city='Maceió', estimated_population=1018948, is_capital=1)
Row(state='AM', state_ibge_code=13, city_ibge_code=1302603, city='Manaus', estimated_population=2182763, is_capital=1)
Row(state='RN', state_ibge_code=24, city_ibge_code=2408102, city='Natal', estimated_population=884122, is_capital=1)
Row(state='RJ', state_ibge_code=33, city_ibge_code=3303302, city='Niterói', estimated_population=513584, is_capital=None)
Row(state='RJ', state_ibge_code=33, city_ibge_code=3303500, city='Nova Iguaçu', estimated_population=821128, is_capital=None)
Row(state='SP', state_ibge_code=35, city_ibge_code=3534401, city='Osasco', estimated_population=698418, is_capital=None)
Row(state='TO', state_ibge_code=17, city_ibge_code=1721000, city='Palmas', estimated_population=299127, is_capital=1)
Row(state='RS', state_ibge_code=43, city_ibge_code=4314902, city='Porto Alegre', estimated_population=1483771, is_capital=1)
Row(state='RO', state_ibge_code=11, city_ibge_code=1100205, city='Porto Velho', estimated_population=529544, is_capital=1)
Row(state='PE', state_ibge_code=26, city_ibge_code=2611606, city='Recife', estimated_population=1645727, is_capital=1)
Row(state='SP', state_ibge_code=35, city_ibge_code=3543402, city='Ribeirão Preto', estimated_population=703293, is_capital=None)
Row(state='AC', state_ibge_code=12, city_ibge_code=1200401, city='Rio Branco', estimated_population=407319, is_capital=1)
Row(state='RJ', state_ibge_code=33, city_ibge_code=3304557, city='Rio de Janeiro', estimated_population=6718903, is_capital=1)
Row(state='BA', state_ibge_code=29, city_ibge_code=2927408, city='Salvador', estimated_population=2872347, is_capital=1)
Row(state='SP', state_ibge_code=35, city_ibge_code=3547809, city='Santo André', estimated_population=718773, is_capital=None)
Row(state='SP', state_ibge_code=35, city_ibge_code=3548708, city='São Bernardo do Campo', estimated_population=838936, is_capital=None)
Row(state='RJ', state_ibge_code=33, city_ibge_code=3304904, city='São Gonçalo', estimated_population=1084839, is_capital=None)
Row(state='SP', state_ibge_code=35, city_ibge_code=3549904, city='São José dos Campos', estimated_population=721944, is_capital=None)
Row(state='MA', state_ibge_code=21, city_ibge_code=2111300, city='São Luís', estimated_population=1101884, is_capital=1)
Row(state='SP', state_ibge_code=35, city_ibge_code=3550308, city='São Paulo', estimated_population=12252023, is_capital=1)
Row(state='ES', state_ibge_code=32, city_ibge_code=3205002, city='Serra', estimated_population=517510, is_capital=None)
Row(state='SP', state_ibge_code=35, city_ibge_code=3552205, city='Sorocaba', estimated_population=679378, is_capital=None)
Row(state='PI', state_ibge_code=22, city_ibge_code=2211001, city='Teresina', estimated_population=864845, is_capital=1)
Row(state='MG', state_ibge_code=31, city_ibge_code=3170206, city='Uberlândia', estimated_population=691305, is_capital=None)
Row(state='ES', state_ibge_code=32, city_ibge_code=3205309, city='Vitória', estimated_population=362097, is_capital=1)
oranzani commented 4 years ago

Marcelo, muito obrigado por todo o seu trabalho. Eu tiraria o local no caso de cidades "detailed". Acho que idade e sexo são mais importantes no caso para estudar o fenômeno. E locais geral já teríamos no cidades normal. O que você acha? Chegaríamos próximos de 100k. Vou ver se penso mais sobre.