airbytehq / PyAirbyte

PyAirbyte brings the power of Airbyte to every Python developer.
https://docs.airbyte.com/pyairbyte
Other
215 stars 31 forks source link

OSError: Cannot call rmtree on a symbolic link #224

Open ThierryDeruyttere opened 4 months ago

ThierryDeruyttere commented 4 months ago

I sometimes receive the following error. I have no idea what causes it and when it's triggered but this is pretty annoying to suddenly have after your job ran for 1h.

File "/usr/local/lib/python3.10/site-packages/airbyte_cdk/entrypoint.py", line 234, in launch
        for message in source_entrypoint.run(parsed_args):
      File "/usr/local/lib/python3.10/site-packages/airbyte_cdk/entrypoint.py", line 99, in run
        with tempfile.TemporaryDirectory() as temp_dir:
      File "/usr/local/lib/python3.10/tempfile.py", line 869, in __exit__
        self.cleanup()
      File "/usr/local/lib/python3.10/tempfile.py", line 873, in cleanup
        self._rmtree(self.name, ignore_errors=self._ignore_cleanup_errors)
      File "/usr/local/lib/python3.10/tempfile.py", line 855, in _rmtree
        _shutil.rmtree(name, onerror=onerror)
      File "/usr/local/lib/python3.10/shutil.py", line 736, in rmtree
        onerror(os.path.islink, path, sys.exc_info())
      File "/usr/local/lib/python3.10/shutil.py", line 734, in rmtree
        raise OSError("Cannot call rmtree on a symbolic link")
    OSError: Cannot call rmtree on a symbolic link
aaronsteers commented 4 months ago

@ThierryDeruyttere - Thanks very much for logging, and I'm sorry you have been running into this.

Can you tell me which OS you are using (windows or mac)? And do you have information on which directory is trying to be deleted?

Any information about what runtime you are using could also be helpful. This error implies that symlinks are being provided and trying to be deleted.

ThierryDeruyttere commented 4 months ago

@aaronsteers Sorry for my late reply!

This is on a linux machine. It's trying to delete a temporary directory.

INFO:airbyte.SourceFacebookMarketing:SourceFacebookMarketing runtimes:
Syncing stream custom_ads_insights_image_asset 0:00:32.198297
INFO:airbyte.SourceFacebookMarketing:Finished syncing SourceFacebookMarketing
CRITICAL:airbyte:Cannot call rmtree on a symbolic link
Traceback (most recent call last):
  File "/root/source_facebook_marketing/executable", line 8, in <module>
    sys.exit(run())
  File "/root/source_facebook_marketing/run.py", line 32, in run
    launch(source, sys.argv[1:])
  File "/usr/local/lib/python3.10/site-packages/airbyte_cdk/entrypoint.py", line 235, in launch
    for message in source_entrypoint.run(parsed_args):
  File "/usr/local/lib/python3.10/site-packages/airbyte_cdk/entrypoint.py", line 99, in run
    with tempfile.TemporaryDirectory() as temp_dir:
  File "/usr/local/lib/python3.10/tempfile.py", line 869, in __exit__
    self.cleanup()
  File "/usr/local/lib/python3.10/tempfile.py", line 873, in cleanup
    self._rmtree(self.name, ignore_errors=self._ignore_cleanup_errors)
  File "/usr/local/lib/python3.10/tempfile.py", line 855, in _rmtree
    _shutil.rmtree(name, onerror=onerror)
  File "/usr/local/lib/python3.10/shutil.py", line 736, in rmtree
    onerror(os.path.islink, path, sys.exc_info())
  File "/usr/local/lib/python3.10/shutil.py", line 734, in rmtree
    raise OSError("Cannot call rmtree on a symbolic link")
OSError: Cannot call rmtree on a symbolic link

This is not 100% related to airbyte i think because it comes from tempfile.py but I can't provide my own tempfile through airbyte so I can't fix this problem...

aaronsteers commented 4 months ago

@ThierryDeruyttere - Thanks for providing the more complete stack trace. I see here this is stemming from inside the source-facebook-marketing connector, and specifically within a piece of code managed by the CDK.

As a next step, we can check if that connector is using the latest version of the CDK, and if so, we can try to reproduce the issue either in the CDK itself or with that connector running in another Linux environment.

If you see the same error with any other source connectors, please let us know. That would help us determine if it is more strongly correlated with the connector itself or with the runtime environment.

aaronsteers commented 4 months ago

Confirmed the Facebook Marketing source is using CDK version 0.81.6, published approx. April 15, and the call to tempfile.TemporaryDirectory() is unchanged in the latest CDK version.

So, still no obvious explanation comes to mind for why this would cause an issue... 🤔

ThierryDeruyttere commented 4 months ago

I think this might be because of the serverless compute provider that I'm using (modal). But also not 100% sure. I just copied the CDK version, slapped a try catch around the temp dir context manager and I continue with my life. Even giving it custom paths didn't seem to help so yeah I'm not so sure what's going on there. Locally everything works.

aaronsteers commented 4 months ago

@ThierryDeruyttere - Thanks for the additional context...

I think your theory is correct regarding this being related to the modal runtime. A PR would be welcome if you have a fix we can apply broadly.

Temp directory context managers are particularly problematic to implement in a cross-platform and cross-runtime manner. We've previously found implementation differences between Windows and Linux but I don't think we've specifically experienced this symlink-related issue before. It is likely correct that a 'create temp directory' request in modal, is resulting in a symlink directory, and then the default Python cleanup method fails because it can't or does not want to delete a symlink.