ArthurHeitmann / arctic_shift

Making Reddit data accessible to researchers, moderators and everyone else. Interact with the data through large dumps, an API or web interface.
https://arctic-shift.photon-reddit.com
234 stars 16 forks source link

safely deal with corrupt files in a folder #17

Closed stas00 closed 3 months ago

stas00 commented 3 months ago

Currently if there is a corrupt file in a dir the whole processing crashes with:

Processing file   4 ./economicCollapse_comments.zst
Error reading file: ./economicCollapse_comments.zst
Traceback (most recent call last):
  File "arctic_shift/scripts/fileStreams.py", line 33, in getZstFileJsonStream
    chunk = reader.read(chunk_size)
zstd.ZstdError: zstd decompress error: Unknown frame descriptor

Traceback (most recent call last):
  File "arctic_shift/scripts/processFiles.py", line 64, in <module>
    main()
  File "arctic_shift/scripts/processFiles.py", line 57, in main
    processFolder(fileOrFolderPath)
  File "arctic_shift/scripts/processFiles.py", line 53, in processFolder
    processFile(file)
  File "arctic_shift/scripts/processFiles.py", line 37, in processFile
    print(f"\rRow {i+1}")
UnboundLocalError: local variable 'i' referenced before assignment

it fails in:

print(f"\rRow {i+1}")

because the generator was empty to start with.

I tried to solve it cleanly on the generator side, but it's far from trivial the way it has been designed, so it looks like setting the default i=0 solves it in the most simple way - not the cleanest solution but it works.

There are probably other ways to fix it, so this is just a suggestion.

ArthurHeitmann commented 3 months ago

Good find