Open mfagundes opened 1 year ago
Ok… I am trying to understand what error we have here. It looks like we have three errors:
request to … ended due to timeout: context deadline exceeded
This is a server error, basically. The server took longer than the default timeout. Following the --help
instructions you can see how long is the default timeout and set a different one.
could not create a progress file
Given that oy haven't the --chunk-size
this is unexpected. So, just confirm you haven't and we can open a new issue specifying which error we're talking bout.
Can you attach or share a link of the downloaded file that gives you an unreadable CSV?
--chunk-size
flag. 7-zip
, as the file, despite the error, is downloaded. Seems it unzips it partially (I don't know exactly how compression works). See attached image:And here a snapshot from Windows Explorer, showing the (I guess) partially downloaded and extracted files:
Ok, I just openned #45 to track the error loading existing progress file and gonna rename that one to the actual error: cannot unzip large file downloaded on Windows.
On that matter, let me ask you again:
Can you attach or share a link to the downloaded file that gives you an unreadable CSV?
Sorry, I misunderstood your question. Here is the link:
Estabelecimentos0.zip - 857.513 kB
Just confirming (for the record) that the content ends abruptly:
$ unzip Estabelecimentos0.zip
Archive: Estabelecimentos0.zip
inflating: K3241.K03200Y0.D21119.ESTABELE
error: invalid compressed data to inflate
$ tail K3241.K03200Y0.D21119.ESTABELE
"24216864";"0001";"03";"1";"VANS & VUCS";"08";"20180921";"01";"";"";"20160222";"4530703";"";"AVENIDA";"CAPITAO FRANCISCO CEZAR";"1.169";"";"VILA PINDORAMA";"06415000";"SP";"6213";"11";"47384054";"";"";"";"";"ATAIDECARDOSO@HOTMAIL.COM";"";""
"23281689";"0001";"75";"1";"CJJF";"02";"20150916";"00";"";"";"20150916";"8230001";"7721700,8592999,4781400,8599699";"RUA";"ALVARENGA PEIXOTO";"456";"APT 401";"LOURDES";"30180120";"MG";"4123";"31";"32914931";"";"";"";"";"mcneuenschwander@gmail.com";"";""
"24850096";"0001";"45";"1";"";"08";"20170220";"01";"";"";"20160521";"5611203";"1096100,1093702,4723700,1094500";"RUA";"RUA DRA MARIA APARECIDA CHAIB";"189";"";"CENTRO";"37472000";"MG";"4281";"35";"92191650";"";"";"";"";"rosilea.s@hotmail.com";"";""
"24682654";"0001";"00";"1";"";"02";"20160428";"00";"";"";"20160428";"0723501";"4312600,7119701,0810006,0724301,0500301,0893200,0500302,7210000,0899102";"AVENIDA";"CARLOS GOMES";"513";"SALA 05";"CAIARI";"76801166";"RO";"0003";"69";"32211736";"";"";"";"";"";"";""
"14070560";"0001";"27";"1";"CENTRO AUTOMOTIVO MARIANI";"08";"20170303";"01";"";"";"20110728";"4530703";"4520004,4520001,4520003";"RUA";"GUARANI";"1443";"ANEXO FUNDOS";"CENTRO";"85501050";"PR";"7751";"46";"32244694";"";"";"";"";"";"";""
"09206854";"0001";"01";"1";"PORTO MOVEIS";"02";"20071106";"00";"";"";"20071106";"3101200";"4754701,9529105";"AVENIDA";"AFONSO PORTO EMERIM";"1221";"";"PITANGUEIRAS";"95500000";"RS";"8855";"51";"31414939";"51";"36625655";"";"";"";"";""
"14982593";"0001";"43";"1";"SHOW DA TERRA MS";"02";"20120202";"00";"";"";"20120202";"7311400";"5911199,5920100,6319400,8230001,8592903,8592902";"RUA";"ARARA AZUL";"140";"";"CENTRO";"79400000";"MS";"9065";"67";"99947991";"";"";"";"";"";"";""
"12468754";"0001";"50";"1";"RICCA PARKING";"02";"20100727";"00";"";"";"20100727";"6810201";"6463800,6810202,6821801,6821802";"AVENIDA";"ADOLFO PINHEIRO 1000 ENTRADA 1010";"1001";"CONJ 74";"SANTO AMARO";"04734904";"SP";"7107";"11";"34427390";"11";"55129902";"11";"55129902";"RICCAPARKING@GMAIL.COM";"";""
"25214261";"0001";"35";"1";"TODA BONITA CABELO&MAQUIAGEM DEFINITIVA";"08";"20170220";"01";"";"";"20160715";"9602502";"9602501";"RUA";"MOREIRA CEZAR";"104";"";"CENTRO";"14730000";"SP";"6731";"17";"33612680";"";"";"";"";"exattus_contabil@hotmail.com";"";""
"22825662";"0001";"33";"1";"NELSO⏎ $
Opened the CSV file I downloaded. Up to line 4.149.823 the file seems to be correct. The rest of the content, until the last line (4.150.003), becomes completely messed. See below a small extract, with the last (apparently) correct line and a few lines following it. I guess it has something to do with the compressing/uncompressing method and the incomplete download of the file.
"24762587";"0001";"34";"1";"RESTAURANTE BOI NA BRASA";"02";"20160510";"00";"";"";"20160510";"5611201";"4729602,5611203";"AVENIDA";"TABAPOA";"3101";"";"SETOR 03";"76870441";"RO"; QD0715"AM A000_TElesce MAe"4712100,4781400,4763;"022017";"0001";3928805";506ORTO EJOSa7";9099725009398132";""699";";"CID160510";"00"99";";"C55";"51;"C55";"5JEAN";"5HEIROS";"";"31414939";"51";;"RS";"881";"COT";"ROLADs2015@15@"CEN99";"RUA";81";"01";";"6213";20100;"";";"0;""1439";"";";"86937821";"";";"86937821506RTO EOSa7;"ASTOLFO DUTRA";"22NCH16887";"0001";"04163010F@O P7";";"";"";""DRA: 23; LOTE: 2; C";"4ttus_co31400,77;""
"223;"PRESIONIO HEIL";""
"12NSO PORS";"08";"881";"COP IE";66256RNDRA: 7636RIAO";"900";"QUADRA 120;LOTE 04"9500000";"ES";"5603S";"5JEAN";"5HEIROS"881";;"162";"AFONSO PORT05";"C;"3RG HEIROS";""400";"5";""MT";"";"103B_co31603,7420004,7319015"";"C LOTE: 2; VASCONASC""60019TT0";,33FA";"ALVARENGA PEIXOT5";"69602,5611203";3";3112"39";"3900339";"1";""2437";"87"N599";""";" VASM DENGA P";"ALVA5071"08;"jsc"1362EZAR";"104UL";"HOOP SOLUTION";502,47X7 LT 18";"56";"";"CDRA m LT 18";"56";"";"CDRA m LT 1LT 1LT 19otmail.LOTE: 2; "56112100,4781400"47";"08";"";"20141";;"00";"";"";"2016041";;"00;""
"20"00;"01";""00;"";"";;"AD2";"20LAD";""";"1sDE CANOAS510";""94;"8"2015AR"E HO4";"1";"RESTAURANTE016041""2015122500";"SAN";"""ASTOLFO"";"";"25HEIROS"881";;"142";"83";";"A0001";""DOS 2";m";10";"SE";"3105";"3105";"41";;"90"41"TA05";"41";;"90"41"T8";eirajr@hotm;"A00;"ARLI,742G"88OYGODOY@BOL.CZEN";"2";"VAL119NEXLAD"";EN";"000U028"";"D202";"0002"TON825662";"0001";"33";"1""RUA";4123";"31";"34832850001";"57";"3900331";;";"";"";"";"";"aASTOLFO 701";"39471381";"16";"452000T"1"7739 70220I
0";"QUADRA 121";"M0";""";" VASM DO;"3900339";"112NSP IE";8";"20170216";001902,82997JIREH";" 05";"CA";""82999,9001901,90019;"04";"20210407"";"2448697368126";0,47521";"2;"3;"40";"P;"P0419";"452AIDECARILINA";"04163IVA";"08";"2ON825662";";"025632244694";"";"";"";"";"";"";""
"09;"";"n;"";"";@M A000_6360""
"r020OA";ADORIA@"040,4.;"RUl@hotm:0"349797368125";n474407"";"2448697368126";068126";068126";_63;"1sD"1";"GICAIARI3S";"5JEAN";"5HEAN"000RI3N"00;"63,90OS A3,90O;"62";0";"R;"290019;"2"83";LA BRASILIUA";"ESTAO@GSILIUA",VILA A","ES00U";"9";"TO;"ES"";A.CES JA"5JES J_conES ";""nES ";"""ES563sa@_6350";"PR";"63";"";"";"20110926";"7";"";"C1";"GI;"75380001";"GO";"9p"000;"003800040";"";"";"";";"tro@gm593";mailFINI61";"GI;"";""63";"";1";"CE;""6@gmail.";"CENT"6821801"";""
"7440;"5";1"S CRfer"62";IARIB";1"786B5";"0001";"3NJ UA"ETAGEMSSOR8230SSOR HUER;"29";"1";"753"CE;"016800"E;"2;"MI2997JIREH";"01";"M0,4763602";""";"742G""";"";"GM;"01";"";"";"20150622";"68218RI"SOR 0437";"87"T4923002,4923108,47"01";00,8299"";"";"1154ES00U S CRfe"AND: 1001;";"CENTRO";"30"30"3;"45"";"0325NTE BO0";"";"";"C0140;"";"";UA";41299""";UBO0";"";"";"C0";"vil1206f143 OLIV";"9602,561;"371";31012COS AL5;"8";"79736812512512563155000";"BOJAM0001908902";"0001";"60"001"2350100_6360UARANI";"1;"20AO";"08";"20140,4.;"RUl@ "76801166";"RO"; 5165261";"TO MOBO0S";"0;"335109007"""82";"96794UPEDRO"591YURA";"47"";""3";""R ;"";"2011;"20731""ASTNTR"96970205,6110805";"4MANOENOVE201";";"66";RO"21"EIRO";"1215";GXRO LFO";" 160@hotm";"S";"02";"207";CENTRO";"85501050";"PR";M00IOSaHO A RIC PUBLICIDADE";"08";"201706;"jsebj."""";"RUA"";";"9SB";"e2301,5912
"12969";"2";"eNDREZAS805";"712100,47CEN2013016@01";"GO"M";"";"C"";"0;"394"00080;"39";"e2eAR AL0621"200";""
"2BO0S";seja207";CEF0,478"62";7ric99,47";"228"6A";"2220510";"00;"A,4520005,45307;"0XOTO DE VASCONCELOL G99";";22695329"0000";"PR";"7431"70A@HOF+I";"""
"20308";""";0 ENT080;1";06"690890";"";"3493";";"31";ICO;"";"";"0005";"1g5,4763602";"002010""
"22695329";"19";"93928805";"";""";5500ric99,47";O HEIL";"185"1;"20AO;"";"02";"20120308";"0;"";"02";"1";"I02" EDCENTR"5620104,56OR 02521426125213002,,5611203,1091102"252;""NUc59201O";"508";"";"CE;4129";"00000"1801";"682"394"00080;"646801";"9244";""013ASCONCELOL G99"RUAi20170";""";L02"ai20170";""";56163194ULFO DE "08";"2@";"4."4."4."4."4."4."4."4.5329";"19"30SAL DEs2BO 5 12024225FAZ429280.ETWSA";v"DOS UL";"50220I
2";"UBL;"";0502600"04";"2S";"CE467180";"SP";"6132257A";SEHUSNLHOS@ SERVIEJA EVA";"44";"33094200";"NTRADA";"";"";"";"";"";;"";;"";;22";616";"088112";"81788112";"817S MA153RADA04,8599RUA";0";"";"";""1"A";0CARIL 02R &"70A@;"08"92359604";"478MO";"O 512999"VE DE194ULFO";"CJJF";"ISJEO,4120400,42111"881";;"15";GXRO LFO";" 412881120035243";"08353100";"SC"MOV190410";ACIO 01";"34";"1"";"53100""";";;"00";"";"";"201R 5 ";"6970203";"";"";"";"";"";"";"";"";"";c59201O";"508";"";"CE;31400244r";"";""
"22695
(...)
I'm using Windows 10, with Powershell (with
base
conda environment automatically activated).Tried to download the biggest file (
Estabelecimentos0.zip
). Had the following error:Tried to restart download, and the following error was reported:
With the flag
--force-restart
the download worked, however from the beggining of the file. Once again, after over 500Mb downloaded, the prior timeout error occurred. Can't restart without--force-restart
flag`The
zip
file, however, is downloaded and, when I try to unzip it (using7-zip
) it reports a data error, but saves the content (acsv
file). But this file cannot be loaded in pandas or even in a spreadsheet software. In a text editor (Notepad++
) it shows coherent data for the first lines (about 4.000.000), but after that it's clearly cluttered.With a smaller file (
Empresas1.zip
), it worked correctly. The file was downloaded, unzipped and opened in Pandas (4.494.859 lines)