choderalab / fah-xchem

Tools and infrastructure for automated compound discovery using Folding@home
MIT License
6 stars 3 forks source link

Asynchronously handle fragalysis upload? #125

Open jchodera opened 3 years ago

jchodera commented 3 years ago

Currently, uploading all the data to fragalysis takes something like an hour when protein snapshots are uploaded. We should find a way to either avoid this or offload it to a separate thread/process.

dotsdl commented 3 years ago

Have we tried adding max_concurrent_requests in ~/.aws/config for the uploading user? Adding:

s3 =
    max_concurrent_requests = 100

would increase the number of concurrent requests aws s3 {cp,sync,mv,rm} can utilize from the default of 10 to 100.

dotsdl commented 3 years ago

Otherwise, if it's preferable for upload to be part of the CLI, then we can add an entry point such as upload_artifacts that uses boto3 internally to threadpool/processpool out these calls in Python. This may be the more integrative approach for an end-to-end workflow.

jchodera commented 3 years ago

Great idea! I'll add this to the local ~/.aws/config for now, but it would be great to integrate this upload into the codebase long-term.

dotsdl commented 3 years ago

Understood, we'll pursue upload as part of the fah-xchem CLI. Thanks John!