christophcemper / slack-advanced-exporter

A tool for exporting additional data from Slack that is missing from the official data export.
MIT License
0 stars 0 forks source link

terminates with zip: not a valid zip file when it fails to open an input attachment #6

Open christophcemper opened 3 weeks ago

christophcemper commented 3 weeks ago

PAIN:

After having processes the same 17G three times in a row with 'fetch-emails' and 'fetch-attachments' and 'fetch-private-channels' it fails on step 'fetch-profile-pictures' with a random error.

The log/stdout only shows 2 lines and just terminates.

LOGS:

Failed to open file in input archive: __uploads/F05PSASRNLX/AIPRM - ChatGPT Code Interpreter -2023-08-19.mov

zip: not a valid zip file

SCREENSHOTS:

image

OTHER NOTES/IDEAS/HINTS:

IDEAL SOLUTION:

Combined tasks

General approach so far is to read thru many Gigabytes every time, for every step. That costs a lot of time and resources, and is just wasteful.

Parallel tasks

The code seems to be strictly single-threaded. Multithreading with rate control and exponential back-offs can improve performance further

Log output

Approach to logging is trivial with fmt.Printfs but still acceptable for a little CLI tool, but could and should be better reporting details.

Resiliance

Approach to errors and stability is non-existent. Terminating an hour-long job in the first file/os error shows the in-maturity of this project, certainly unready for production use.

christophcemper commented 3 weeks ago

Defect reproduced some hours later on 2nd job running, even on a 50G zip, same fetch profile pics task

image

christophcemper commented 3 weeks ago

The file number 3 - i.e. after 1 pass of adding attachments, but then going thru it all again per inefficient design leads to a "possible zip bomb"

reproducable consistent with unzip -t export3-with-emails-and-attachments-private-channels.zip

image

pigz even rejects to test it due to "unknown compression method"

pigz -t export3-with-emails-and-attachments-private-channels.zip

pigz: skipping: export3-with-emails-and-attachments-private-channels.zip unknown compression method

the source archive used to create that doesn't have that problem, and tests fine. and a grep hints at the files "surrounding" - not sure if the order is kept

image

there are hints at other problems with german Umlaus and "local" filenames not matching on that Ubuntu box.

testing the "last shown file" in the defect archive doesn't lead to an error though, so it's the files "around"

image

So it looks like the same version of the same .mov but with different sizes are the hotspot in these 10000s of files

image

Also odd - the dates in the defect export3* archive are all 1979-11-30 while input was all 1980-00-00 which is even an invalid date.

Both dates don't make sense at all, fwiw. We didn't have ChatGPT in 1980.

christophcemper commented 3 weeks ago

for some odd reason the zip files created contain duplicate files, possibly leading to more problems with every generation of the zip content

see defect https://github.com/christophcemper/slack-advanced-exporter/issues/7