bellingcat / auto-archiver

Automatically archive links to videos, images, and social media content from Google Sheets (and more).
https://pypi.org/project/auto-archiver/
MIT License
489 stars 53 forks source link

os.path.abspath failing due to os.getcwd() fail #97

Closed djhmateer closed 4 months ago

djhmateer commented 9 months ago

Related to https://github.com/bellingcat/auto-archiver/issues/96 which fixed another issue

# unknown why it fails on second time
# this will fail as the call to os.getcwd() fails
# https://stackoverflow.com/questions/3210902/python-why-does-os-getcwd-sometimes-crash-with-oserror
# bar = os.path.abspath(foo)
# but if we fix with using psutil way, then get errors further down with shutil copying

# browsertrix_home_host = os.environ.get('BROWSERTRIX_HOME_HOST') or os.path.abspath(ArchivingContext.get_tmp_dir())
# hard code path fixes it for now
browsertrix_home_host = '/mnt/c/dev/v6-auto-archiver' + ArchivingContext.get_tmp_dir()[1:]

browsertrix_home_container = os.environ.get('BROWSERTRIX_HOME_CONTAINER') or browsertrix_home_host

The error is:

File "/mnt/c/dev/v6-auto-archiver/src/auto_archiver/enrichers/wacz_enricher.py", line 54, in enrich
bar = os.path.abspath(foo)
File "/usr/lib/python3.10/posixpath.py", line 384, in abspath
  cwd = os.getcwd()
FileNotFoundError: [Errno 2] No such file or directory

This could be my platform (WSL2) so I'll try on another server.

Have been looking to see how to turn off the deletion of the tmp directory to see if that helps (haven't found out how to do that yet)

msramalho commented 9 months ago

can you confirm that everything was working at this commit and that it still does not work after this fix?

for the tmp dir, since we use the with operator it will always get destroyed once the execution leaves that block, no way I can think of going around it. you could modify the ArchivingContext to use a stationary folder.

djhmateer commented 9 months ago

Thanks Miguel - yes I can confirm both questions

My Ubuntu 20.04 server is working fine, so I suspect it's something strange on my WSL2 setup with Ubuntu 22.04

I've done a simple code around for now. Will keep an eye on things and maybe the answer will appear!

hard_code_directory_for_wsl2 ='/mnt/c/dev/v6-auto-archiver' 
try:
      browsertrix_home_host = os.environ.get('BROWSERTRIX_HOME_HOST') or os.path.abspath(ArchivingContext.get_tmp_dir())
except FileNotFoundError:
      logger.warning('Dev environment found using ' + hard_code_directory_for_wsl2)
      browsertrix_home_host = hard_code_directory_for_wsl2 + ArchivingContext.get_tmp_dir()[1:]
msramalho commented 9 months ago

Given it so far seems a WSL specific issue we wont work on this but welcome external contributions, and hope you can find a solution.