cuducos / chunk

🧱 Chunk is a download manager for slow and unstable servers
MIT License
56 stars 3 forks source link

Error unzipping large file downloaded with `chunk` in Windows #44

Open mfagundes opened 1 year ago

mfagundes commented 1 year ago

I'm using Windows 10, with Powershell (with base conda environment automatically activated).

Tried to download the biggest file (Estabelecimentos0.zip). Had the following error:

(base) PS C:\Users\mauricio\chunk_teste> ..\chunk-v1.0.0-windows-amd64.exe https://dadosabertos.rfb.gov.br/CNPJ/Estabelecimentos0.zip --force-restart
Downloading 622.4MB of 878.1MB  70.88%  1.4MB/s2022/12/26 18:51:31 error downloadinf chunk #90073: error downloading https://dadosabertos.rfb.gov.br/CNPJ/Estabelecimentos0.zip: All attempts fail:
#1: request to https://dadosabertos.rfb.gov.br/CNPJ/Estabelecimentos0.zip ended due to timeout: context deadline exceeded
#2: request to https://dadosabertos.rfb.gov.br/CNPJ/Estabelecimentos0.zip ended due to timeout: context deadline exceeded
#3: request to https://dadosabertos.rfb.gov.br/CNPJ/Estabelecimentos0.zip ended due to timeout: context deadline exceeded
#4: request to https://dadosabertos.rfb.gov.br/CNPJ/Estabelecimentos0.zip ended due to timeout: context deadline exceeded
#5: request to https://dadosabertos.rfb.gov.br/CNPJ/Estabelecimentos0.zip ended due to timeout: context deadline exceeded
(base) PS C:\Users\mauricio\chunk_teste>

Tried to restart download, and the following error was reported:

(base) PS C:\Users\mauricio\chunk_teste> ..\chunk-v1.0.0-windows-amd64.exe https://dadosabertos.rfb.gov.br/CNPJ/Estabelecimentos0.zip
2022/12/26 18:52:46 could not creat a progress file: error loading existing progress file: error decoding progress file C:\Users\mauricio\.chunk\c811d2999ff5d6a15340c98b44fd8126-Estabelecimentos0.zip: unexpected EOF
(base) PS C:\Users\mauricio\chunk_teste>

With the flag --force-restart the download worked, however from the beggining of the file. Once again, after over 500Mb downloaded, the prior timeout error occurred. Can't restart without --force-restart flag`

The zip file, however, is downloaded and, when I try to unzip it (using 7-zip) it reports a data error, but saves the content (a csv file). But this file cannot be loaded in pandas or even in a spreadsheet software. In a text editor (Notepad++) it shows coherent data for the first lines (about 4.000.000), but after that it's clearly cluttered.

With a smaller file (Empresas1.zip), it worked correctly. The file was downloaded, unzipped and opened in Pandas (4.494.859 lines)

cuducos commented 1 year ago

Ok… I am trying to understand what error we have here. It looks like we have three errors:

request to … ended due to timeout: context deadline exceeded

This is a server error, basically. The server took longer than the default timeout. Following the --help instructions you can see how long is the default timeout and set a different one.

could not create a progress file

Given that oy haven't the --chunk-size this is unexpected. So, just confirm you haven't and we can open a new issue specifying which error we're talking bout.

Error when unzipping the downloaded files

Can you attach or share a link of the downloaded file that gives you an unreadable CSV?

mfagundes commented 1 year ago
  1. Yes, the timeout error seems to be from the server.
  2. I didn't use the --chunk-size flag.
  3. The unzipping error occurs in 7-zip, as the file, despite the error, is downloaded. Seems it unzips it partially (I don't know exactly how compression works). See attached image:

image

And here a snapshot from Windows Explorer, showing the (I guess) partially downloaded and extracted files:

image

cuducos commented 1 year ago

Ok, I just openned #45 to track the error loading existing progress file and gonna rename that one to the actual error: cannot unzip large file downloaded on Windows.

On that matter, let me ask you again:

Can you attach or share a link to the downloaded file that gives you an unreadable CSV?

mfagundes commented 1 year ago

Sorry, I misunderstood your question. Here is the link:

Estabelecimentos0.zip - 857.513 kB

cuducos commented 1 year ago

Just confirming (for the record) that the content ends abruptly:

$ unzip Estabelecimentos0.zip
Archive:  Estabelecimentos0.zip
  inflating: K3241.K03200Y0.D21119.ESTABELE  
  error:  invalid compressed data to inflate
$ tail K3241.K03200Y0.D21119.ESTABELE 
"24216864";"0001";"03";"1";"VANS & VUCS";"08";"20180921";"01";"";"";"20160222";"4530703";"";"AVENIDA";"CAPITAO FRANCISCO CEZAR";"1.169";"";"VILA PINDORAMA";"06415000";"SP";"6213";"11";"47384054";"";"";"";"";"ATAIDECARDOSO@HOTMAIL.COM";"";""
"23281689";"0001";"75";"1";"CJJF";"02";"20150916";"00";"";"";"20150916";"8230001";"7721700,8592999,4781400,8599699";"RUA";"ALVARENGA PEIXOTO";"456";"APT 401";"LOURDES";"30180120";"MG";"4123";"31";"32914931";"";"";"";"";"mcneuenschwander@gmail.com";"";""
"24850096";"0001";"45";"1";"";"08";"20170220";"01";"";"";"20160521";"5611203";"1096100,1093702,4723700,1094500";"RUA";"RUA DRA MARIA APARECIDA CHAIB";"189";"";"CENTRO";"37472000";"MG";"4281";"35";"92191650";"";"";"";"";"rosilea.s@hotmail.com";"";""
"24682654";"0001";"00";"1";"";"02";"20160428";"00";"";"";"20160428";"0723501";"4312600,7119701,0810006,0724301,0500301,0893200,0500302,7210000,0899102";"AVENIDA";"CARLOS GOMES";"513";"SALA  05";"CAIARI";"76801166";"RO";"0003";"69";"32211736";"";"";"";"";"";"";""
"14070560";"0001";"27";"1";"CENTRO AUTOMOTIVO MARIANI";"08";"20170303";"01";"";"";"20110728";"4530703";"4520004,4520001,4520003";"RUA";"GUARANI";"1443";"ANEXO FUNDOS";"CENTRO";"85501050";"PR";"7751";"46";"32244694";"";"";"";"";"";"";""
"09206854";"0001";"01";"1";"PORTO MOVEIS";"02";"20071106";"00";"";"";"20071106";"3101200";"4754701,9529105";"AVENIDA";"AFONSO PORTO EMERIM";"1221";"";"PITANGUEIRAS";"95500000";"RS";"8855";"51";"31414939";"51";"36625655";"";"";"";"";""
"14982593";"0001";"43";"1";"SHOW DA TERRA MS";"02";"20120202";"00";"";"";"20120202";"7311400";"5911199,5920100,6319400,8230001,8592903,8592902";"RUA";"ARARA AZUL";"140";"";"CENTRO";"79400000";"MS";"9065";"67";"99947991";"";"";"";"";"";"";""
"12468754";"0001";"50";"1";"RICCA PARKING";"02";"20100727";"00";"";"";"20100727";"6810201";"6463800,6810202,6821801,6821802";"AVENIDA";"ADOLFO PINHEIRO 1000 ENTRADA 1010";"1001";"CONJ  74";"SANTO AMARO";"04734904";"SP";"7107";"11";"34427390";"11";"55129902";"11";"55129902";"RICCAPARKING@GMAIL.COM";"";""
"25214261";"0001";"35";"1";"TODA BONITA CABELO&MAQUIAGEM DEFINITIVA";"08";"20170220";"01";"";"";"20160715";"9602502";"9602501";"RUA";"MOREIRA CEZAR";"104";"";"CENTRO";"14730000";"SP";"6731";"17";"33612680";"";"";"";"";"exattus_contabil@hotmail.com";"";""
"22825662";"0001";"33";"1";"NELSO⏎                                                                                                                                                   $ 
This is different from the same file downloaded in macOS today There is no error message when unzipping and the file ends as a normal CSV file: ``` $ unzip Estabelecimentos0.zip Archive: Estabelecimentos0.zip inflating: K3241.K03200Y0.D21119.ESTABELE $ tail K3241.K03200Y0.D21119.ESTABELE "48662247";"0001";"08";"1";"CAMILA FERRAZ";"02";"20221119";"00";"";"";"20221119";"4721104";"8219999";"RUA";"MARACANA (JD GUANABARA)";"1000";"CXPST ESQUINA COM O CAMPO DO AREAO";"AREAO";"78010680";"MT";"9067";"65";"96489381";"";"";"";"";"MIHOMIFI@GMAIL.COM";"";"" "48662257";"0001";"35";"1";"ALEX LANCHES";"02";"20221119";"00";"";"";"20221119";"5611203";"5620104,4723700,4721103,4721104";"AVENIDA";"LUCIDIO FLORENCIO RIBEIRO";"1293";"";"CENTRO";"83480000";"PR";"5455";"41";"97731878";"";"";"";"";"MONICATPRESTES@GMAIL.COM";"";"" "48662269";"0001";"60";"1";"MR CONSTRUCOES";"02";"20221119";"00";"";"";"20221119";"4399103";"4321500,4322301,4330404,4744099";"10A RUA";"JAIR FLORES";"406";"";"LOTEAMENTO MOTTER";"99770000";"RS";"8517";"65";"92693190";"";"";"";"";"BUUUMODAALTERNATIVA@GMAIL.COM";"";"" "48662280";"0001";"20";"1";"";"02";"20221119";"00";"";"";"20221119";"7319002";"";"RUA";"PROFESSORA EROTIDES DA SILVA FONTES";"2080";"";"SAO VICENTE";"88309601";"SC";"8161";"47";"99626240";"";"";"";"";"GMCONTABILIDADE@CONTABILGM.COM";"";"" "48662292";"0001";"54";"1";"NANNA LANCHES";"02";"20221119";"00";"";"";"20221119";"5611203";"1096100";"RUA";"DOUTOR ALFREDO BACKER";"536";"APT 502;BLOCO 08";"ALCANTARA";"24452005";"RJ";"5897";"21";"91181275";"";"";"";"";"RE259351@GMAIL.COM";"";"" "48662302";"0001";"51";"1";"YASMIN BEAUTY HAIR";"02";"20221119";"00";"";"";"20221119";"9602501";"";"PASSAGEM";"SANTO AMARO";"160";"";"MARACANGALHA";"66110210";"PA";"0427";"91";"84406485";"";"";"";"";"SOARESY2000@GMAIL.COM";"";"" "48662312";"0001";"97";"1";"NANE MARQUES";"02";"20221119";"00";"";"";"20221119";"7319002";"";"RUA";"CONSTANTE KAVESKI";"146";"CONJ";"CARA-CARA";"84033166";"PR";"7777";"42";"98267278";"";"";"";"";"NANEEMARQUES14@GMAIL.COM";"";"" "48662323";"0001";"77";"1";"";"02";"20221119";"00";"";"";"20221119";"9602501";"9602502";"RUA";"SETE DE SETEMBRO";"285";"SALA 01";"CENTRO";"89770000";"SC";"8345";"49";"99441088";"";"";"";"";"FISCALMDCONTABILIDADE@HOTMAIL.COM";"";"" "48662334";"0001";"57";"1";"BRUNA TRANSPORTES";"02";"20221119";"00";"";"";"20221119";"5212500";"";"RUA";"JOAO DIPPE";"73";"CASA GEMINADO 2";"IRIRIU";"89227087";"SC";"8179";"47";"97238498";"";"";"";"";"ELIACAVALCANTE518@GMAIL.COM";"";"" "48662344";"0001";"92";"1";"EMPORIO VINHE-SE";"02";"20221119";"00";"";"";"20221119";"5612100";"";"RUA";"ROSA RIGO BONADIMAN";"31";"CASA NA RUA DO MERCADO TSUNAME CASA NA RUA DO MERC";"BONADIMAN";"45991110";"BA";"3993";"73";"99535004";"";"";"";"";"JAIANESANTOSBARROS@GMAIL.COM";"";"" $ ```
mfagundes commented 1 year ago

Opened the CSV file I downloaded. Up to line 4.149.823 the file seems to be correct. The rest of the content, until the last line (4.150.003), becomes completely messed. See below a small extract, with the last (apparently) correct line and a few lines following it. I guess it has something to do with the compressing/uncompressing method and the incomplete download of the file.

"24762587";"0001";"34";"1";"RESTAURANTE BOI NA BRASA";"02";"20160510";"00";"";"";"20160510";"5611201";"4729602,5611203";"AVENIDA";"TABAPOA";"3101";"";"SETOR 03";"76870441";"RO"; QD0715"AM A000_TElesce MAe"4712100,4781400,4763;"022017";"0001";3928805";506ORTO EJOSa7";9099725009398132";""699";";"CID160510";"00"99";";"C55";"51;"C55";"5JEAN";"5HEIROS";"";"31414939";"51";;"RS";"881";"COT";"ROLADs2015@15@"CEN99";"RUA";81";"01";";"6213";20100;"";";"0;""1439";"";";"86937821";"";";"86937821506RTO EOSa7;"ASTOLFO DUTRA";"22NCH16887";"0001";"04163010F@O P7";";"";"";""DRA: 23; LOTE: 2; C";"4ttus_co31400,77;""
"223;"PRESIONIO HEIL";""
"12NSO PORS";"08";"881";"COP IE";66256RNDRA: 7636RIAO";"900";"QUADRA 120;LOTE 04"9500000";"ES";"5603S";"5JEAN";"5HEIROS"881";;"162";"AFONSO PORT05";"C;"3RG HEIROS";""400";"5";""MT";"";"103B_co31603,7420004,7319015"";"C LOTE: 2; VASCONASC""60019TT0";,33FA";"ALVARENGA PEIXOT5";"69602,5611203";3";3112"39";"3900339";"1";""2437";"87"N599";""";" VASM DENGA P";"ALVA5071"08;"jsc"1362EZAR";"104UL";"HOOP SOLUTION";502,47X7  LT 18";"56";"";"CDRA m  LT 18";"56";"";"CDRA m  LT 1LT 1LT 19otmail.LOTE: 2; "56112100,4781400"47";"08";"";"20141";;"00";"";"";"2016041";;"00;""
"20"00;"01";""00;"";"";;"AD2";"20LAD";""";"1sDE  CANOAS510";""94;"8"2015AR"E HO4";"1";"RESTAURANTE016041""2015122500";"SAN";"""ASTOLFO"";"";"25HEIROS"881";;"142";"83";";"A0001";""DOS 2";m";10";"SE";"3105";"3105";"41";;"90"41"TA05";"41";;"90"41"T8";eirajr@hotm;"A00;"ARLI,742G"88OYGODOY@BOL.CZEN";"2";"VAL119NEXLAD"";EN";"000U028"";"D202";"0002"TON825662";"0001";"33";"1""RUA";4123";"31";"34832850001";"57";"3900331";;";"";"";"";"";"aASTOLFO 701";"39471381";"16";"452000T"1"7739 70220I
0";"QUADRA 121";"M0";""";" VASM DO;"3900339";"112NSP IE";8";"20170216";001902,82997JIREH";"  05";"CA";""82999,9001901,90019;"04";"20210407"";"2448697368126";0,47521";"2;"3;"40";"P;"P0419";"452AIDECARILINA";"04163IVA";"08";"2ON825662";";"025632244694";"";"";"";"";"";"";""
"09;"";"n;"";"";@M A000_6360""
"r020OA";ADORIA@"040,4.;"RUl@hotm:0"349797368125";n474407"";"2448697368126";068126";068126";_63;"1sD"1";"GICAIARI3S";"5JEAN";"5HEAN"000RI3N"00;"63,90OS A3,90O;"62";0";"R;"290019;"2"83";LA BRASILIUA";"ESTAO@GSILIUA",VILA A","ES00U";"9";"TO;"ES"";A.CES JA"5JES J_conES ";""nES ";"""ES563sa@_6350";"PR";"63";"";"";"20110926";"7";"";"C1";"GI;"75380001";"GO";"9p"000;"003800040";"";"";"";";"tro@gm593";mailFINI61";"GI;"";""63";"";1";"CE;""6@gmail.";"CENT"6821801"";""
"7440;"5";1"S CRfer"62";IARIB";1"786B5";"0001";"3NJ  UA"ETAGEMSSOR8230SSOR HUER;"29";"1";"753"CE;"016800"E;"2;"MI2997JIREH";"01";"M0,4763602";""";"742G""";"";"GM;"01";"";"";"20150622";"68218RI"SOR 0437";"87"T4923002,4923108,47"01";00,8299"";"";"1154ES00U S CRfe"AND: 1001;";"CENTRO";"30"30"3;"45"";"0325NTE BO0";"";"";"C0140;"";"";UA";41299""";UBO0";"";"";"C0";"vil1206f143 OLIV";"9602,561;"371";31012COS AL5;"8";"79736812512512563155000";"BOJAM0001908902";"0001";"60"001"2350100_6360UARANI";"1;"20AO";"08";"20140,4.;"RUl@        "76801166";"RO";      5165261";"TO MOBO0S";"0;"335109007"""82";"96794UPEDRO"591YURA";"47"";""3";""R ;"";"2011;"20731""ASTNTR"96970205,6110805";"4MANOENOVE201";";"66";RO"21"EIRO";"1215";GXRO LFO";" 160@hotm";"S";"02";"207";CENTRO";"85501050";"PR";M00IOSaHO A RIC PUBLICIDADE";"08";"201706;"jsebj."""";"RUA"";";"9SB";"e2301,5912
"12969";"2";"eNDREZAS805";"712100,47CEN2013016@01";"GO"M";"";"C"";"0;"394"00080;"39";"e2eAR AL0621"200";""
"2BO0S";seja207";CEF0,478"62";7ric99,47";"228"6A";"2220510";"00;"A,4520005,45307;"0XOTO DE VASCONCELOL G99";";22695329"0000";"PR";"7431"70A@HOF+I";"""
"20308";""";0 ENT080;1";06"690890";"";"3493";";"31";ICO;"";"";"0005";"1g5,4763602";"002010""
"22695329";"19";"93928805";"";""";5500ric99,47";O HEIL";"185"1;"20AO;"";"02";"20120308";"0;"";"02";"1";"I02" EDCENTR"5620104,56OR 02521426125213002,,5611203,1091102"252;""NUc59201O";"508";"";"CE;4129";"00000"1801";"682"394"00080;"646801";"9244";""013ASCONCELOL G99"RUAi20170";""";L02"ai20170";""";56163194ULFO DE "08";"2@";"4."4."4."4."4."4."4."4.5329";"19"30SAL DEs2BO 5   12024225FAZ429280.ETWSA";v"DOS UL";"50220I
2";"UBL;"";0502600"04";"2S";"CE467180";"SP";"6132257A";SEHUSNLHOS@ SERVIEJA EVA";"44";"33094200";"NTRADA";"";"";"";"";"";;"";;"";;22";616";"088112";"81788112";"817S MA153RADA04,8599RUA";0";"";"";""1"A";0CARIL 02R &"70A@;"08"92359604";"478MO";"O 512999"VE DE194ULFO";"CJJF";"ISJEO,4120400,42111"881";;"15";GXRO LFO";" 412881120035243";"08353100";"SC"MOV190410";ACIO 01";"34";"1"";"53100""";";;"00";"";"";"201R 5 ";"6970203";"";"";"";"";"";"";"";"";"";c59201O";"508";"";"CE;31400244r";"";""
"22695
(...)