Closed lauraedelson closed 6 years ago
Good catch! Thanks, @lauraedelson! However, why not call the close() in a finally block?
except ArchiveLoadFailed as exception:
...
finally:
stream.close()
Also not that the 'file:' prefix is stripped from uri, so the if-condition will never get true.
Hi Sebastian, Moved the check to a finally block, as requested! Actually, the 'file:' prefix isn't stripped. I'm using this code now on my branch, and the prefix is there and it works correctly. I did it that way because the code was already recognizing file streams by the prefix, as on line 185..
Actually, the 'file:' prefix isn't stripped.
Ok, I see: base_dir
is added as prefix of uri
. But then it depends on how the program is called whether the condition becomes true or not. I've tried it calling $SPARK_HOME/bin/spark-submit ./word_count.py ...
and the input uri
is not prefixed with file:
. But this may also depend on the Spark version used.
Anyway, it should not harm to also close the temporary files for s3:// input. But I'll test ed9835e next week to verify that there are no problems. Thanks, @lauraedelson!
I've tested the general finally: stream.close()
in local mode and on a Hadoop cluster. No issues. Thanks, @lauraedelson!
My circumstances dictated that I had to download CC files from AWS and then parse then on my local cluster. When I did this I noticed that file handles weren't being closed, which quickly eats through available memory. Simple fix, though!