cybergis / cybergis-compute-core

Apache License 2.0
7 stars 6 forks source link

Globus Transfers Throwing Errors #107

Open alexandermichels opened 5 months ago

alexandermichels commented 5 months ago

It appears that the GlobusTaskListManager is not always properly cleared resulting in users being unable to download two subdirectories (or the result folder plus a subdirectory like slurm_logs) from a single Job.

Example error:

Exception                                 Traceback (most recent call last)

File /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.4/lib/python3.8/site-packages/cybergis_compute_client/UI.py:671, in UI.onDownloadButtonClick.<locals>.on_click(change)

    669     filename = self.globus_filename

    670 localPath = os.path.join(self.jupyter_globus['root_path'], filename)

--> 671 self.compute.job.download_result_folder_by_globus(remotePath=self.download['dropdown'].value, localEndpoint=localEndpoint, localPath=localPath)

    672 print('please check your data at your root folder under "' + filename + '"')

    673 self.compute.recentDownloadPath = os.path.join(self.jupyter_globus['container_home_path'], filename)

File /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.4/lib/python3.8/site-packages/cybergis_compute_client/Job.py:294, in Job.download_result_folder_by_globus(self, localPath, localEndpoint, remotePath, raw)

    291 folderId = jobStatus['remoteResultFolder']['id']

    293 # init globus transfer

--> 294 self.client.request('POST', '/folder/' + folderId + '/download/globus-init', {

    295     "jobId": self.id,

    296     "jupyterhubApiToken": self.jupyterhubApiToken,

    297     "fromPath": remotePath,

    298     "toPath": localPath,

    299     "toEndpoint": localEndpoint

    300 })

    302 status = None

    303 while status not in ['SUCCEEDED', 'FAILED']:

File /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.4/lib/python3.8/site-packages/cybergis_compute_client/Client.py:67, in Client.request(self, method, uri, body)

     65     if 'messages' in data:

     66         msg = str(data['messages'])

---> 67     raise Exception('server ' + self.url + uri + ' responded with error "' + data['error'] + msg + '"')

     69 return data

Exception: server cgjobsup.cigi.illinois.edu:443/folder/1706310009J1WZs/download/globus-init responded with error "a globus job is currently running on folder with id 1706310009J1WZs"

A few things here:

alexandermichels commented 5 months ago

A workaround for now: if we delete the keys with the pattern globus_task_* from redis, it seems to allow us to re-download the folder. This can be done within the redis container using the following line:

redis-cli KEYS "globus_task_*" | xargs redis-cli DEL

This is obviously only a workaround through.