galaxyproject / training-material

A collection of Galaxy-related training material
https://training.galaxyproject.org
MIT License
294 stars 846 forks source link

Questions Metagenomics (general-tutorial) #530

Open bernt-matthias opened 6 years ago

bernt-matthias commented 6 years ago

I'm just working on the general tutorial and have some questions. I already have a few minor points where I think that I can improve this nice tutorial slightly. I would be happy to incorporate any answers to my questions in the tutorial.

shiltemann commented 6 years ago

Hi @bernt-matthias, thanks for this :) the things I can answer off the top of my head:

Feel free to clarify any of these points in the tutorial as I agree some of these things may be confusing/unclear.

bernt-matthias commented 6 years ago

The download link used in the metaphlan data manager is currently broken (see https://bitbucket.org/biobakery/metaphlan2/issues/43/download-link-broken#comment-39094922). Could someone be so nice and share the data with me?

shiltemann commented 6 years ago

hmm, that's unfortunate :/ I don't have a copy of this data, but perhaps @bebatut does?

bebatut commented 6 years ago

I updated the data manager last month on the ToolShed to solve this issue. Which version are you using?

bernt-matthias commented 6 years ago

The toolshed has only version 0:9c4ad82be5bd of the data_manager_metaphlan2_database_downloader which I have installed.

Thanks for the fast answers.

bebatut commented 6 years ago

When did you install it? It is not the data manager I updated, but the conda package at the beginning of the month

bernt-matthias commented 6 years ago

According to "Manage Tool Dependencies" I have metaphlan2 version 2.6.0 installed.

bebatut commented 6 years ago

It does not tell which build number for the conda package :worried:

bernt-matthias commented 6 years ago

It also tells me the full path: ...../galaxy-dev/database/dependencies/_conda/envs/mulled-v1-c5867e29ea0fba532ec8dc4a557d8798445dccb6ecf21f67e09143751a79b65d

is this the build number? How could I find out which build I'm using. I wanted to learn how to interact with Galaxy's conda anyway. Is there some place where I could start reading?

shiltemann commented 6 years ago

I believe conda list gives both the version and build number of all installed packages in an environment.

As for documentation, this may be a place to start: https://docs.galaxyproject.org/en/master/admin/conda_faq.html

bernt-matthias commented 6 years ago

Thanks for the link.

conda list -r gives me 2017-08-24 12:45:36 (rev 0) .. is this what you are asking for?

Its confusing that our galaxy has two metaphlan2 environments: mulled-v1-c5867e29ea0fba532ec8dc4a557d8798445dccb6ecf21f67e09143751a79b65d and __metaphlan2\@2.6.0 both give me the same information.

How can I update the revision? Is there some configuration that allows galaxy to check and update revisions automatically?

shiltemann commented 6 years ago

try conda list -v, you'll get something like:

blast-legacy              2.2.22                        1    bioconda
boost                     1.60.0                   py36_0  
icu                       54.1                          0  

where the third column is build number for each package in the environment. To update I don't think there is an easy Galaxy way yet, but you could activate the environment and run conda update methaphlan2 ..or maybe just removing and reinstalling the tool/dependency is easiest. And yes, it's normal to get a mulled environment for free https://docs.galaxyproject.org/en/master/admin/mulled_containers.html

but let's also ask our conda oracle @bgruening for more details on this

bernt-matthias commented 6 years ago

When I activate GALAXY/database/dependencies/_conda/bin/activate GALAXY/database/dependencies/_conda/envs/__metaphlan2\@2.6.0/ then conda list -v includes

metaphlan2 2.6.0 py27_1 bioconda

Oddly conda update methaphlan2 returns

PackageNotFoundError: Package not found: 'methaphlan2' Package 'methaphlan2' is not installed in GALAXY/database/dependencies/_conda/envs/__metaphlan2@2.6.0

Will now try to uninstall the tool.

shiltemann commented 6 years ago

ok, so looks like you are on build 1, and Bérénice's update was build 2. Not sure about the update failure maybe try conda update --all next time, but removing and reinstalling should definitely work

shiltemann commented 6 years ago

oh, your error could be typo? (typed methaphlan2)

bgruening commented 6 years ago

Removing and let Galaxy recreate the environment is the easiest way to go. I'm not sure update will update to build releases. @bernt-matthias simply rm this folder and the mulled one. You can also activate this one https://github.com/galaxyproject/galaxy/blob/dev/config/galaxy.ini.sample#L215 temporary.

bernt-matthias commented 6 years ago

Thanks, I will do it like this. I have enabled this variable anyway. Should there be a mechanism in galaxy to update revisions (automatically) anyway?

bgruening commented 6 years ago

Probably not automatically and I would hope that these broken packages are very rare :pray: ... The nice thing on a separate conda manager is that we now can maintain the the envs separately from Galaxy. Maybe a nice script, given the package-name and version, with a few options (remove, update, recreate) could be handy ... or simply creating the mulled name in version v1 and v2 ...

bernt-matthias commented 6 years ago

Got it working. Thanks.

One more question: In the Humann2 part the tool Regroup a HUMAnN2 should be set to "UniRef50 gene families into GO". This option seems to be unavailable. I have checked the tool definition and this option is commented. The comments were introduced by @bebatut in https://github.com/galaxyproject/tools-iuc/commit/8618e8a96c9063f350da7fc7f1d76ca9a3c361d3#diff-877ac9a895dd12fca74911e6cb7e7aab

bebatut commented 6 years ago

Yes. I need to update the tutorial. This option was disable in the humann2 tool in the last version (I do not why they did that...)

bernt-matthias commented 6 years ago

It appears to me that the option is still there:

humann2_regroup_table -h
usage: humann2_regroup_table [-h] [-i INPUT]
                             [-g {uniref90_go,uniref50_pfam,uniref50_infogo1000,uniref90_ko,uniref90_eggnog,uniref90_pfam,uniref90_level4ec,uniref50_eggnog,uniref50_go,uniref50_level4ec,uniref90_infogo1000,uniref50_ko,uniref90_rxn,uniref50_rxn}]
                             [-c CUSTOM] [-r] [-f {sum,mean}] [-e PRECISION]
                             [-u {Y,N}] [-p {Y,N}] [-o OUTPUT]

I used the version in the latest installed conda env.

Also the download of the utility tables with humann2_databases seems to work.

See also https://bitbucket.org/biobakery/humann2/wiki/Home#markdown-header-humann2_regroup_table

bernt-matthias commented 6 years ago

I will try to update the tool and the data manager. Maybe it just works.

bernt-matthias commented 6 years ago

https://github.com/galaxyproject/tools-iuc/pull/1480 .. still untested, but a start :)

hexylena commented 4 years ago

@bebatut @shiltemann @subinamehta are working on a more general metatranscriptomics tutorial that may address this issue.