Closed rhpvorderman closed 6 years ago
This is strange. First test job finishes fine. Other jobs crash?
This is strange. First test job finishes fine. Other jobs crash?
If the test is using a setup with separate handlers the data table reload won't work (and so you won't have a valid dbkey). You'll have to activate watch_tool_data_dir the galaxy.ini https://github.com/galaxyproject/galaxy/blob/dev/config/galaxy.ini.sample#L326
At least that's my guess and that's also the reason I implemented this.
Thanks @mvdbeek . Let's see if it works.
Hmm, does docker-galaxy-stable include watchdog ? That may is required for this functionality.
It does. Because ansible-galaxy-extras installs this by default.
The strange thing is that the run-data-managers command can reload the data tables itself:
DEBUG:urllib3.connectionpool:http://localhost:8080 "GET /api/tool_data/fasta_indexes/reload?key=admin HTTP/1.1" 200 None
However. This does not have a 100% success rate on the docker container.
Some test jobs fail, while others succeed. It seems to be random.
When I used a galaxy vm in the cloud I had no problems whatsoever. Maybe it is caused by a combination of travis and the docker container running on limited resources?
However. This does not have a 100% success rate on the docker container. Some test jobs fail, while others succeed. It seems to be random.
Last time I checked this will only work for the current web request, which in turn won't be able to communicate the change to the job handlers or other web processes, so it isn't strange, just unfortunate that galaxy's IPC doesn't work out of the box.
Is there a way to fix this with settings so the test can be run?
I'm checking this out now.
So indeed watchdog is properly installed and working, but when we install a new data manager we add a new data table, that is not being watched by watchdog. I think restarting galaxy after the data manager installation would fix this test (watchdog then triggers automatic table reloads, no need to do this in ephemeris anymore, which is shaky at best), while I think of a way to fix this in galaxy.
I think https://github.com/rhpvorderman/ephemeris/pull/1 should work, but I haven't tested it.
@mvdbeek Looks good! Thanks! Like I said, ephemeris seems to work perfectly fine outside of travis. Due to limited resources the data table reloads may be too slow. Another way to fix it would be to introduce custom wait times for the reloading which can be changed by a command line flag. But this is a bit hackish of a solution for only one use case.
Darn! Still some failing issues, but these seem not to be related at all to the run-data-managers program. It is toxenv.sh failing for python 3.5 @mvdbeek . It seems that your fix has fixed the issue with the run-data-managers test. Thanks a lot!
Let me take a bit of the guesswork out here: The data table reload takes milliseconds, that is not a problem. The issue (AFAICT) is 3 fold:
watch_tool_data_dir
must be set. This works fine (I've been using this in production ever since I added this option), but it only listens to tables that are known during server startup. So if you install a new data manager, but you don't restart galaxy watchdog will not listen to the new table.For both of the latter option it appears as if hitting the data table endpoint works if you just do it often enough, because you stochastically hit the different processes/threads, but the handlers will never be updated, so when you run a job that relies on a dbkey and a fasta item, the job will simply fail, which is what we've been observing before https://github.com/galaxyproject/ephemeris/pull/53/commits/ed6a30883d89cde7c06cad7a763d9d1dbc29f111.
To not give people reading the code the wrong idea, I'd be strongly in favor of removing tool_data_client.reload_data_table(str(data_table))
we use watchdog to monitor the individual tables in each process, which means watchdog must be installed and watch_tool_data_dir must be set. This works fine (I've been using this in production ever since I added this option), but it only listens to tables that are known during server startup. So if you install a new data manager, but you don't restart galaxy watchdog will not listen to the new table.
That makes sense. That will solve a weird issue I had in my ansible provisioning script. Thanks. Will remove it from run-data-managers and shall add this bit of information to the tests.
The failing tests seem to be related to python-3.5 not being in travis' environment. Can you add - "3.5"
after line 6 in the .travis.yml file ?
Can you try this and remove the separate python and env section ?
matrix:
include:
- python: 2.7
env: TOX_ENV=py27-lint
- python: 2.7
env: TOX_ENV=py27-lint-readme
- python: 2.7
env: TOX_ENV=py27
- python: 3.5
env: TOX_ENV=py35-lint
- python: 3.5
env: TOX_ENV=py35
global:
# do not load /etc/boto.cfg with Python 3 incompatible plugin
# https://github.com/travis-ci/travis-ci/issues/5246#issuecomment-166460882
- BOTO_CONFIG=/doesnotexist
Now it is shed-install choking on the python3 tests...
Cool, at least we're getting somewhere. This looks like a bioblend issue.
This piece in shed_install.py is causing the error
tsc = tool_shed_client(gi)
installed_revisions_list = []
itl = tsc.get_repositories()
If gi is defined using bioblend 0.9.0 and so is tsc then the problem is in 0.9.0
Try another version of bioblend maybe?
This is the latest version, I can fix this in bioblend.
Alright, fix is here: https://github.com/galaxyproject/bioblend/pull/251. In the meantime is there a chance we can use an api-key instead of username/password ? This should work.
Green! But running the full test for 5 environments is a little wasteful of our travis contingent, I'll rework this to not run the test / download the docker image in the lint-environments.
@rhpvorderman have a look over the last changes, with your 👍 I'm happy to merge this (even though restarting galaxy is a little brittle it seems). Ultimately https://github.com/galaxyproject/galaxy/pull/4617 should do away with the need for restarting.
Wow! Great! Thanks for all the contributions. The pull request at galaxy main also looks really nice. Let's merge this! Maybe I will get to #54 and as soon as that is fixed I think that warrants a new pypi package update.
Oh wait. There are some things...
I have a suggestion. This is an entirely different issue. It has nothing to do with run-data-managers and its functionality. Let's move the docker port exposure issue to another pull request and not try to fix it here.
Why not fix it here? It is only needed for this test and it works now. The security implications are not that severe, given that docker doesn't run as privileged user.
I incorrectly assumed that this would take a long time to fix. My apologies for my impatience. But yes, since everything works now, why not fix it here? Shall we merge this request?
Alright, this is some very nice functionality, thanks a lot.
Alright, this is some very nice functionality, thanks a lot.
And thank you for all the contributions to the testing of it. I would not have been able to do this myself.
Combination of #51 and #52 with tests.