galaxyproject / galaxy

Data intensive science for everyone.
https://galaxyproject.org
Other
1.41k stars 1.01k forks source link

Output dataset from FTP import could not be read, import fails #14032

Open ajs6f opened 2 years ago

ajs6f commented 2 years ago

Describe the bug Output dataset from FTP import could not be read, import fails.

Galaxy Version and/or server at which you observed the bug

{
    "version_major": "21.09",
    "version_minor": "1.dev0"
}

Commit: (run git rev-parse HEAD if you run this Galaxy server)

f40274f6b9f6a15eb4022aab21286d4c96cd8475

To Reproduce Attempt to use FTP import function on any file.

Expected behavior Successful import.

Additional context

FTP functionality has not worked successfully, and this is but one of the problems we've seen (see https://github.com/galaxyproject/galaxy/issues/10721).

The specific error message is:

Job 174's output dataset(s) could not be read

Of course the job number varies. There is no Tool Standard Output or Tool Standard Error displayed. In the log I find:

galaxy.tool_util.provided_metadata DEBUG 2022-06-07 19:32:10,263 [pN:main.web.1,p:95810,w:1,m:0,tN:UnivaJobRunner.work_thread-3] unnamed outputs [{'__unnamed_outputs': [{'destination': {'type': 'hdas'}, 'name': '', 'elements': [{'error_message': 'Failed to fetch url gxftp://xx139569.fasta. <urlopen error unknown url type: gxftp>', 'object_id': 255}]}]}]

which seems to indicate something related to https://github.com/galaxyproject/galaxy/issues/10721? Perhaps the same problem masked?

ajs6f commented 2 years ago

The relevant section from our galaxy.yml.

  # Enable Galaxy's "Upload via FTP" interface.  You'll need to install
  # and configure an FTP server (we've used ProFTPd since it can use
  # Galaxy's database for authentication) and set the following two
  # options. This will be provided to users in the help text as 'log in
  # to the FTP server at '. Thus, it should be the hostname of your FTP
  # server.
  ftp_upload_site: hydra-5.si.edu

  # This should point to a directory containing subdirectories matching
  # users' identifier (defaults to e-mail), where Galaxy will look for
  # files.
  ftp_upload_dir: /pool/spare/galaxy/galaxy/database/ftp

  # User attribute to use as subdirectory in calculating default
  # ftp_upload_dir pattern. By default this will be email so a user's
  # FTP upload directory will be ${ftp_upload_dir}/${user.email}. Can
  # set this to other attributes such as id or username though.
  ftp_upload_dir_identifier: username

  # Python string template used to determine an FTP upload directory for
  # a particular user.
  # Defaults to '${ftp_upload_dir}/${ftp_upload_dir_identifier}'.
  #ftp_upload_dir_template: null

  # Set to false to prevent Galaxy from deleting uploaded FTP files as
  # it imports them.
  ftp_upload_purge: false
mvdbeek commented 2 years ago

You'll need to tell us a bit more about your setup, are you using separate job handlers ? Have you configured file_sources or file_sources_config_file ?

ajs6f commented 2 years ago

We have not added any config at all to the handlers section of job_conf.xml-- it is empty. Our setup sends all work to a pre-existing Univa-managed cluster, so the Galaxy server itself should be doing nothing other than answering requests from users. To my (limited) understanding this should obviate much need for complex job handling at Galaxy, although I welcome correction. As for the parameters you mention, we have not configured them and they are not explicitly set in our galaxy.yml. I have been working from the instructions here and they are not mentioned. We also do not have a file_sources_conf.yml file in play, if that is important.

ajs6f commented 2 years ago

@mvdbeek Is there any other information that might be helpful here? Would you recommend that I set file_sources or create a file_sources_conf.xml file? Do I need to somehow configure the connection between gxftp "protocol" and the FTP-related code in Galaxy? Thank you for any ideas or advice.

mvdbeek commented 2 years ago

It does look a bit like something isn't configured right, like a client that is newer than the server code, or the tool script being older than the server code. gxftp is a valid URI.

ajs6f commented 2 years ago

All of the code in play is from one Github checkout, so I don't think we did anything obvious locally that would create that situation. Is there some way I can check or control the client or tool script versions? Thanks!