ML4GW / aframev2

Detecting binary black hole mergers in LIGO with neural networks
MIT License
4 stars 14 forks source link

Handle transient S3 errors in tune #161

Closed EthanMarx closed 2 months ago

EthanMarx commented 2 months ago

The remote tuning was being plagued by transient permission errors when fetching / pushing to s3. After chatting with nautilus admins, it was discovered that there is an issue with the internal url used for accessing s3 from within the cluster. The external url (although slower) works fine, so for the time being we use that.

This PR also adds retries to various points where we interact with s3 to future proof us from this issue.

wbenoit26 commented 2 months ago

Is the conda-lock accurate? Did you re-lock after removing pycbc?

EthanMarx commented 2 months ago

Yeah I relocked it - I can double check pycbc isn't in there

EthanMarx commented 2 months ago

I think it's probably just versions of existing packages being updated but I can double check before merging