Open christophcemper opened 3 weeks ago
Defect reproduced some hours later on 2nd job running, even on a 50G zip, same fetch profile pics task
The file number 3 - i.e. after 1 pass of adding attachments, but then going thru it all again per inefficient design leads to a "possible zip bomb"
reproducable consistent with
unzip -t export3-with-emails-and-attachments-private-channels.zip
pigz even rejects to test it due to "unknown compression method"
pigz -t export3-with-emails-and-attachments-private-channels.zip
pigz: skipping: export3-with-emails-and-attachments-private-channels.zip unknown compression method
the source archive used to create that doesn't have that problem, and tests fine. and a grep hints at the files "surrounding" - not sure if the order is kept
there are hints at other problems with german Umlaus and "local" filenames not matching on that Ubuntu box.
testing the "last shown file" in the defect archive doesn't lead to an error though, so it's the files "around"
So it looks like the same version of the same .mov but with different sizes are the hotspot in these 10000s of files
Also odd - the dates in the defect export3* archive are all 1979-11-30 while input was all 1980-00-00 which is even an invalid date.
Both dates don't make sense at all, fwiw. We didn't have ChatGPT in 1980.
for some odd reason the zip files created contain duplicate files, possibly leading to more problems with every generation of the zip content
see defect https://github.com/christophcemper/slack-advanced-exporter/issues/7
PAIN:
After having processes the same 17G three times in a row with 'fetch-emails' and 'fetch-attachments' and 'fetch-private-channels' it fails on step 'fetch-profile-pictures' with a random error.
The log/stdout only shows 2 lines and just terminates.
LOGS:
SCREENSHOTS:
OTHER NOTES/IDEAS/HINTS:
IDEAL SOLUTION:
Combined tasks
General approach so far is to read thru many Gigabytes every time, for every step. That costs a lot of time and resources, and is just wasteful.
Parallel tasks
The code seems to be strictly single-threaded. Multithreading with rate control and exponential back-offs can improve performance further
Log output
Approach to logging is trivial with fmt.Printfs but still acceptable for a little CLI tool, but could and should be better reporting details.
Resiliance
Approach to errors and stability is non-existent. Terminating an hour-long job in the first file/os error shows the in-maturity of this project, certainly unready for production use.