setup_data_libraries.py never completes

Slugger70 commented 6 years ago

I have an issue with setup_data_libraries.py. When run, if there are any jobs on the target Galaxy server that are not in an ok state, the script never finishes as it is stuck in a loop waiting for ALL jobs on the Galaxy server to be in that state.

        no_break = True
        while True:
            no_break = False
            for job in jc.get_jobs():
                if job['state'] != 'ok':
                    no_break = True
            if not no_break:
                break
            time.sleep(3)

        time.sleep(20)
        log.info("Finished importing test data.")

If the target Galaxy server in question has been around for some time then there will more than likely be some jobs in a new or error state let alone other non library creation related jobs still running.

I think it would be much better to capture the upload job id's for each upload in a list and just wait for them to complete.

rhpvorderman commented 6 years ago

I have taken a quick look. This problem is solvable, but it requires setup_data_libraries.py to make full use of the bioblend api. In the library module there is a wait for dataset option that seems to be sufficient for this use case. It includes a timeout as well. I will try a quick fix, but if this does not work I am afraid the whole script needs to be refactored, which may take some time.

rhpvorderman commented 6 years ago

It was as I feared. The entire script should be refactored to properly make use of the bioblend API. This will allow for keeping track of the library and dataset IDs throughout the script so that the wait_for_dataset option can be called. I propose that a more object-oriented method is followed as in the get-tool-list function. Now it is a sequence of functions that is hard to keep track of.

Anyone who has spare time on their hands who would like to do this?

Slugger70 commented 6 years ago

I'm currently working on it a bit to make sure it doesn't re-create libraries and re-upload datasets in them that already exist.

rhpvorderman commented 6 years ago

:+1: feel free to make a pull request.

galaxyproject / ephemeris

setup_data_libraries.py never completes #102