Advanced yaml templating and checking of already installed genomes.

rhpvorderman commented 6 years ago

Combination of #51 and #52 with tests.

rhpvorderman commented 6 years ago

This is strange. First test job finishes fine. Other jobs crash?

mvdbeek commented 6 years ago

This is strange. First test job finishes fine. Other jobs crash?

If the test is using a setup with separate handlers the data table reload won't work (and so you won't have a valid dbkey). You'll have to activate watch_tool_data_dir the galaxy.ini https://github.com/galaxyproject/galaxy/blob/dev/config/galaxy.ini.sample#L326

At least that's my guess and that's also the reason I implemented this.

rhpvorderman commented 6 years ago

Thanks @mvdbeek . Let's see if it works.

mvdbeek commented 6 years ago

Hmm, does docker-galaxy-stable include watchdog ? That ~~may~~ is required for this functionality.

rhpvorderman commented 6 years ago

It does. Because ansible-galaxy-extras installs this by default. The strange thing is that the run-data-managers command can reload the data tables itself: DEBUG:urllib3.connectionpool:http://localhost:8080 "GET /api/tool_data/fasta_indexes/reload?key=admin HTTP/1.1" 200 None However. This does not have a 100% success rate on the docker container. Some test jobs fail, while others succeed. It seems to be random.

When I used a galaxy vm in the cloud I had no problems whatsoever. Maybe it is caused by a combination of travis and the docker container running on limited resources?

mvdbeek commented 6 years ago

However. This does not have a 100% success rate on the docker container. Some test jobs fail, while others succeed. It seems to be random.

Last time I checked this will only work for the current web request, which in turn won't be able to communicate the change to the job handlers or other web processes, so it isn't strange, just unfortunate that galaxy's IPC doesn't work out of the box.

rhpvorderman commented 6 years ago

Is there a way to fix this with settings so the test can be run?

mvdbeek commented 6 years ago

I'm checking this out now.

mvdbeek commented 6 years ago

So indeed watchdog is properly installed and working, but when we install a new data manager we add a new data table, that is not being watched by watchdog. I think restarting galaxy after the data manager installation would fix this test (watchdog then triggers automatic table reloads, no need to do this in ephemeris anymore, which is shaky at best), while I think of a way to fix this in galaxy.

mvdbeek commented 6 years ago

I think https://github.com/rhpvorderman/ephemeris/pull/1 should work, but I haven't tested it.

rhpvorderman commented 6 years ago

@mvdbeek Looks good! Thanks! Like I said, ephemeris seems to work perfectly fine outside of travis. Due to limited resources the data table reloads may be too slow. Another way to fix it would be to introduce custom wait times for the reloading which can be changed by a command line flag. But this is a bit hackish of a solution for only one use case.

rhpvorderman commented 6 years ago

Darn! Still some failing issues, but these seem not to be related at all to the run-data-managers program. It is toxenv.sh failing for python 3.5 @mvdbeek . It seems that your fix has fixed the issue with the run-data-managers test. Thanks a lot!

mvdbeek commented 6 years ago

Let me take a bit of the guesswork out here: The data table reload takes milliseconds, that is not a problem. The issue (AFAICT) is 3 fold:

If you're running galaxy using the default config in run.sh you're just using a single shared web and job handler process, so no cross-process communication is necessary to notify all processes that new data table entries are available. Installing tools / data managers and running them is not a problem. The manual reloading through the API is not necessary.
In a production setup with multiple processes the process that received the job (ie. running the data manager) will get the new table -- if you're just using 1 process that will be fine, but most setups use multiple processes and threads, these need to be notified somehow. We can do this 2 ways
- each process can listen to changes to the data tables, or we can use an external messaging queue like rabbitmq to pass messages. @bgruening's image comes with rabbitmq, but it isn't setup properly, and maintaining rabbitmq is another additional complication, so that's not ideal.
- we use watchdog to monitor the individual tables in each process, which means watchdog must be installed and watch_tool_data_dir must be set. This works fine (I've been using this in production ever since I added this option), but it only listens to tables that are known during server startup. So if you install a new data manager, but you don't restart galaxy watchdog will not listen to the new table.

For both of the latter option it appears as if hitting the data table endpoint works if you just do it often enough, because you stochastically hit the different processes/threads, but the handlers will never be updated, so when you run a job that relies on a dbkey and a fasta item, the job will simply fail, which is what we've been observing before https://github.com/galaxyproject/ephemeris/pull/53/commits/ed6a30883d89cde7c06cad7a763d9d1dbc29f111. To not give people reading the code the wrong idea, I'd be strongly in favor of removing tool_data_client.reload_data_table(str(data_table))

rhpvorderman commented 6 years ago

we use watchdog to monitor the individual tables in each process, which means watchdog must be installed and watch_tool_data_dir must be set. This works fine (I've been using this in production ever since I added this option), but it only listens to tables that are known during server startup. So if you install a new data manager, but you don't restart galaxy watchdog will not listen to the new table.

That makes sense. That will solve a weird issue I had in my ansible provisioning script. Thanks. Will remove it from run-data-managers and shall add this bit of information to the tests.

mvdbeek commented 6 years ago

The failing tests seem to be related to python-3.5 not being in travis' environment. Can you add - "3.5" after line 6 in the .travis.yml file ?

mvdbeek commented 6 years ago

Can you try this and remove the separate python and env section ?

matrix:
  include:
    - python: 2.7
      env: TOX_ENV=py27-lint
    - python: 2.7
      env: TOX_ENV=py27-lint-readme
    - python: 2.7
      env: TOX_ENV=py27
    - python: 3.5
      env: TOX_ENV=py35-lint
    - python: 3.5
      env: TOX_ENV=py35
   global:
     # do not load /etc/boto.cfg with Python 3 incompatible plugin
     # https://github.com/travis-ci/travis-ci/issues/5246#issuecomment-166460882
     - BOTO_CONFIG=/doesnotexist

rhpvorderman commented 6 years ago

Now it is shed-install choking on the python3 tests...

mvdbeek commented 6 years ago

Cool, at least we're getting somewhere. This looks like a bioblend issue.

rhpvorderman commented 6 years ago

This piece in shed_install.py is causing the error

    tsc = tool_shed_client(gi)
    installed_revisions_list = []
    itl = tsc.get_repositories()

If gi is defined using bioblend 0.9.0 and so is tsc then the problem is in 0.9.0

Try another version of bioblend maybe?

mvdbeek commented 6 years ago

This is the latest version, I can fix this in bioblend.

mvdbeek commented 6 years ago

Alright, fix is here: https://github.com/galaxyproject/bioblend/pull/251. In the meantime is there a chance we can use an api-key instead of username/password ? This should work.

mvdbeek commented 6 years ago

Green! But running the full test for 5 environments is a little wasteful of our travis contingent, I'll rework this to not run the test / download the docker image in the lint-environments.

mvdbeek commented 6 years ago

@rhpvorderman have a look over the last changes, with your 👍 I'm happy to merge this (even though restarting galaxy is a little brittle it seems). Ultimately https://github.com/galaxyproject/galaxy/pull/4617 should do away with the need for restarting.

rhpvorderman commented 6 years ago

Wow! Great! Thanks for all the contributions. The pull request at galaxy main also looks really nice. Let's merge this! Maybe I will get to #54 and as soon as that is fixed I think that warrants a new pypi package update.

rhpvorderman commented 6 years ago

Oh wait. There are some things...

rhpvorderman commented 6 years ago

I have a suggestion. This is an entirely different issue. It has nothing to do with run-data-managers and its functionality. Let's move the docker port exposure issue to another pull request and not try to fix it here.

mvdbeek commented 6 years ago

Why not fix it here? It is only needed for this test and it works now. The security implications are not that severe, given that docker doesn't run as privileged user.

rhpvorderman commented 6 years ago

I incorrectly assumed that this would take a long time to fix. My apologies for my impatience. But yes, since everything works now, why not fix it here? Shall we merge this request?

mvdbeek commented 6 years ago

Alright, this is some very nice functionality, thanks a lot.

rhpvorderman commented 6 years ago

Alright, this is some very nice functionality, thanks a lot.

And thank you for all the contributions to the testing of it. I would not have been able to do this myself.

galaxyproject / ephemeris

Advanced yaml templating and checking of already installed genomes. #53