bellingcat / auto-archiver

Automatically archive links to videos, images, and social media content from Google Sheets (and more).
https://pypi.org/project/auto-archiver/
MIT License
489 stars 53 forks source link

os.path.abspath failing on second time through #96

Closed djhmateer closed 9 months ago

djhmateer commented 9 months ago

In the latest ie todays release wacz_enricher.py https://github.com/bellingcat/auto-archiver/blob/main/src/auto_archiver/enrichers/wacz_enricher.py#L50

The second time through ie mutiple lines in the spreadsheet, the os.path.abspath call fails even though the directory exists on the filesystem. I've tried a sleep command. It's like the underlying handle to the filesystem hasn't been updated or flushed. I'm on WSL2, so it could be that. Have hardcoded for now, and will report back if I learn more. Just thought I'd put this here in case any suggestions.

url = to_enrich.get_url()

collection = str(uuid.uuid4())[0:8]

# unknown why it fails on second time
# browsertrix_home_host = os.environ.get('BROWSERTRIX_HOME_HOST') or os.path.abspath(ArchivingContext.get_tmp_dir())
foo = ArchivingContext.get_tmp_dir()
logger.warning(f'{foo=}')
# will fail below on second url even though the path exists on filesystem
bar = os.path.abspath(foo)

Here is the error

File "/mnt/c/dev/v6-auto-archiver/src/auto_archiver/core/orchestrator.py", line 88, in archive result.merge(a.download(result)) File "/mnt/c/dev/v6-auto-archiver/src/auto_archiver/enrichers/wacz_enricher.py", line 40, in download if self.enrich(result): File "/mnt/c/dev/v6-auto-archiver/src/auto_archiver/enrichers/wacz_enricher.py", line 58, in enrich bar = os.path.abspath(foo) File "/usr/lib/python3.10/posixpath.py", line 384, in abspath cwd = os.getcwd() FileNotFoundError: [Errno 2] No such file or directory

msramalho commented 9 months ago

So I believe the issue with the release was in the assignment of self.docker_commands here which would propagate to the following iteration the previous tmp_folder.

I'm not sure if you still see some other error (?)

fix to come shortly.

djhmateer commented 9 months ago

Thank you - this did fix 1 issue.. .have opened another issue for the getcwd issue.