galaxyproject / galaxy

Data intensive science for everyone.
https://galaxyproject.org
Other
1.4k stars 1.01k forks source link

Importing files from user-defined AWS S3 bucket not working #18750

Open Slushy-seg opened 2 months ago

Slushy-seg commented 2 months ago

I have set up a Galaxy to allow users to add their own remote storage locations (private and public AWS S3 buckets). For that I used the file_source_templates_config_file configuration and added the template for private & public AWS S3 buckets (from https://docs.galaxyproject.org/en/latest/admin/data.html#file-source-templates).

On Galaxy, the user preference to set up a (private or public) AWS bucket is shown correctly and the set up works as intended. Now, the user can select files from the bucket through "Upload" => "Choose remote files". When selecting a file and triggering the import, the import fails.

Galaxy Version: 24.1 (same behavior on usegalaxy.eu)

To Reproduce Steps to reproduce the behavior:

  1. Go to user preferences and define a public S3 bucket under "Manage Your Remote File Sources" Screenshot 2024-08-30 085827
  2. Go to "Upload" and selected "Choose remote files".
  3. Select the previously defined remote storage location. Files on the bucket are shown Screenshot 2024-08-30 085852
  4. Select a file from the remote S3 bucket and import that file to the current history
  5. Import fails Screenshot 2024-08-30 085744_2
  6. Further log message shown: Screenshot 2024-08-30 085712

Expected behavior Files imported from a user defined private S3 bucket are stored in the user's history.

Screenshots If applicable, add screenshots to help explain your problem.

Additional context

sanjaysrikakulam commented 2 months ago

I briefly looked at the upload of data from a public S3 bucket, and it works fine.

I added the same bucket, 1000genomes, to user preferences through Manage Your Remote File Sources.

image

image

The bucket name should be plain without the s3:// (I am not entirely sure whether this matters since you are able to browse the data; maybe it matters when it tries to fetch; I did not look into the implementation details)

Can you give this a try?

--EDIT-- Please ensure that these are set and defined in your galaxy.yml

  1. object_store_cache_path (a path for Galaxy to use for caching, this is optional, defaults to mutable_data dir, I think)
  2. object_store_cache_size (in Gb) (mandatory, default is -1)

Tracebacks from the handler logs would help debug your problems.

Slushy-seg commented 2 months ago

Thanks a lot. Just specifying the bucket without s3:// does indeed work!! I think I was confused as Galaxy tries to validate the bucket name against the AWS syntax (an error I got with earlier tries),

This is quite a nice feature for my use-case. In my current set up, changes made to the S3 bucket (i.e. file deletions or creations) are not reflected in the Galaxy UI. Is there a way to force Galaxy to read in the file directory each time a user browses the bucket?

bgruening commented 2 months ago

@Slushy-seg the files should be visible as soon as they are created. Can you look with a different viewer into your S3 and see if the files are really there?

mvdbeek commented 2 months ago

You probably want to set a listings_expiry_time if you're using s3fs, that gets passed on to the underlying library which by default never expires (https://github.com/fsspec/s3fs/issues/851). 60 seems to work well.

Slushy-seg commented 2 months ago

listings_expiry_time is exactly what I need. Where can I set this parameter?

Slushy-seg commented 1 month ago

Any idea were to set this parameter @mvdbeek ?

sanjaysrikakulam commented 1 month ago

This would require a change in the s3fs file-source plugin likely in this class as a property that can be passed to the open_fs function, as a default value for everyone globally.