bgruening / galaxytools

:microscope::books: Galaxy Tool wrappers
MIT License
115 stars 220 forks source link

Useful imaging tools for Brain data #142

Open bgruening opened 9 years ago

bgruening commented 9 years ago

Brain cell finder is a tool for fully automated localization of soma in 3D mouse brain images acquired by confocal light sheet microscopy.

https://github.com/paolo-f/bcfind

Started here: https://github.com/bgruening/galaxytools/blob/master/tools/image_processing/bcfind/bcfind.xml

gregvonkuster commented 8 years ago

@bgruening The installation instructions for https://github.com/paolo-f/bcfind install the head of the git repo. For the tool_dependencies.xml recipe, should I use a "git clone..." followed by a "git checkout..." as the mechanism for providing a specific feature set for reproducibility? This would be similar to what is being done for the freebase package recipe and others. If so, should I just use the current tip for the recipe?

nsoranzo commented 8 years ago

@gregvonkuster I use git reset --hard <commit_hash> after the git clone .

gregvonkuster commented 8 years ago

@nsoranzo Thanks, I'll go that route and use the current repo head as the commit hash.

gregvonkuster commented 8 years ago

@bgruening @nsoranzo This package is going to require a fairly complex installation recipe with many dependencies (e.g., PIL, tables, pandas, scikit-learn, progressbar-latest, numpy, scipy, mahotas, ujson) all installed into a version of Continuum Analytic Anaconda Python (http://continuum.io/downloads).

My understanding of how package dependencies like this work may be flawed, so I'm hoping you can clarify or confirm things for me.

Like other package recipes that install Python as a dependency, Galaxy job execution for tools that use this package will run with the Continuum Analytic Anaconda Python (due to the dependency setting the PYTHONPATH variable prior to job execution) instead of the Galaxy Python environment that consists of a "vanilla" Python along with the eggs/wheels components that provides Galaxy dependencies.

This means that the Continuum Analytic Anaconda Python will need to install all dependencies Galaxy needs for job execution (similar to how this is done with the other "package_python..." packages). This list of dependencies currently includes package_bzlib_1_0, package_sqlite_3_8_3 and package_zlib_1_2_8 (possibly) among others.

The weakness here is that recipes like this will need to be "fixed" whenever the Galaxy job framework adds an additional dependency.

Is my understanding of this correct? I appreciate you feedback on this as I have other recipes I'm planning to submit to open source that use this same approach and I want to make sure I'm going about this the right way.

bgruening commented 8 years ago

@gregvonkuster @nsoranzo you don't need to use git to get a specific commit from github. Curl/wget works for every revision.

For example this one for the last revision: https://github.com/paolo-f/bcfind/archive/ed3ea2c29d22652996fe27ae509625c52b5b77c2.tar.gz.

bgruening commented 8 years ago

@gregvonkuster regarding your python installation question, this is actually a topic I'm currently work on. Please have a look at the deeptools package this should help you a lot without conda: https://github.com/galaxyproject/tools-iuc/pull/277

Please let me know if this fixes your problem.

gregvonkuster commented 8 years ago

@bgruening I've got the installation recipes with all dependencies basically done for bcfind. I noticed that your current tool outline at https://github.com/bgruening/galaxytools/blob/master/tools/image_processing/bcfind/bcfind.xml has set the bcfind dependency version to 1.3.18 via your requirement tag.

<requirement type="package" version="1.3.18">bcfind</requirement>

Is this the version you want me to set in the bcfind tool_dependencies.xml recipes I have? Currently I have this since it is the repo head.

<package name="bcfind" version="ed3ea2c29d">l;

My install recipe installs the package using this URL which is the repo head.

<action type="download_file" target_filename="bcfind.tar.gz">https://github.com/paolo-f/bcfind/archive/ed3ea2c29d22652996fe27ae509625c52b5b77c2.tar.gz&lt;/action&gt;

Please let me know what you would like for the version string. Thanks!

bgruening commented 8 years ago

This was a old version, we can/should update the version :) Thanks!

gregvonkuster commented 8 years ago

@bgruening ok, just to confirm, should it be "ed3ea2c29d"? Thanks!

bgruening commented 8 years ago

I think this is ok!

gregvonkuster commented 8 years ago

@bgruening Need some insight on the tool(s) you are looking for here. The bcfind manual at http://bcfind.dinfo.unifi.it/guide.html sort of implies a suite of tools as:

bcfind_make_substacks bcfind_cell_finder bcfind_measure_performance bcfind_supervised_semantic_deconvolution bcfind_merge_markers bcfind_manifold_filter

Can you let me know if this baseline tool set is what you are looking for?

FYI: The current scripts are written in such a way that wrapping them in Galaxy will be a bit more complex than usual. There are a lot of assumptions on input and output directory names, specific file extensions, etc. So the Galaxy wrappers will likely have to include a lot of temporary manipulations of these items.

Also, some of the scripts (make_substacks.py at https://github.com/paolo-f/bcfind/blob/master/bcfind/scripts/make_substacks.py) seem to imply the Galaxy wrapper should be written to work with collections which I've not yet done. Any pointers to tools that have been optimally written to deal with collections will be helpful.

Thanks!

bgruening commented 8 years ago

I have forwarded you a mail from the author, with more informations. Here you can find an example of producing dataset collections: https://github.com/bgruening/galaxytools/blob/master/tools/text_processing/split_file_on_column/split_file_on_column.xml Also inside of the planemo documentation is a really good tutorial: https://planemo.readthedocs.org/en/latest/writing_advanced.html#collections

And last but not least the functional collection tests are awesome as training material: https://github.com/galaxyproject/galaxy/tree/dev/test/functional/tools This is where I'm looking for suggestions :)

bgruening commented 8 years ago

A new version was released a few days back: https://github.com/paolo-f/bcfind/releases

gregvonkuster commented 8 years ago

Thanks Bjorn,

I’ll keep this on my list, but it will be a while. I’m currently buried with Galaxy consulting for 1 lab and I have an additional lab I’ll be supporting starting next week. I’ll take a look at this as soon as I get a chance though.

Thanks Bjorn!

Greg

On Nov 6, 2015, at 4:39 AM, Björn Grüning notifications@github.com wrote:

A new version was released a few days back: https://github.com/paolo-f/bcfind/releases https://github.com/paolo-f/bcfind/releases — Reply to this email directly or view it on GitHub https://github.com/bgruening/galaxytools/issues/142#issuecomment-154362057.