cbg-ethz / shorah

Repo for the software suite ShoRAH (Short Reads Assembly into Haplotypes)
GNU General Public License v3.0
41 stars 14 forks source link

Conda package for 1.99 not working properly #72

Open heylf opened 4 years ago

heylf commented 4 years ago

I tried to install shorah via bioconda (conda install -c biconda shorah) and executing shorah will give the following error:

`pkg_resources.DistributionNotFound: The 'shorah' distribution was not found and is required by the application

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File .../miniconda3/envs/shorah/bin/shorah", line 11, in from shorah.cli import main File ".../miniconda3/envs/shorah/lib/python3.6/site-packages/shorah/cli.py", line 53, in with open(os.path.join(base_dir, '.version'), 'r') as version_file: FileNotFoundError: [Errno 2] No such file or directory: '..../miniconda3/envs/shorah/.version' `

DrYak commented 4 years ago

Hello, and thank you for posting your concerns.

Well, that is really weird, as the bioconda package is what we are currently using in production in V-pipe with SARA-CoV-2 sequencing data.

Something must have changed in the lastest miniconda version as I can't reproduce your error message on my production installation, but I do get the exact same message when attempting to use it using a clean installation in a docker of Ubuntu:stable.

I'll try to investigate it more in detail tomorrow.

DrYak commented 4 years ago

Hello Heylf,

sorry for the slow answer, we're currently having some major computation trouble here, so I had less time to devote to your issue.

Meanwhile, upstream conda have again changed something because now the package works again and I am unable reproduce the problem in the Ubuntu:stable docker using the exact same sequence as last time. :-(

I'll try to investigate it as I get some free time aside from our other problems.

DrYak commented 4 years ago

Can you give it a try on your side and tell me if you're still affected ?

And what platform are you using ? Linux installation running on bare metal? VM? Docker? WSL1/2 in windows 10? And which distribution ?

heylf commented 4 years ago

Hey DrYak, I tried it again, but it still fails. I am using Ubuntu 18.04.3, no VM or docker, with a new miniconda3 environment for shorah.

I tried it now to install it directly in the base of miniconda3 and it works. Seems like it does not work if you create an own env for shorah.

bgruening commented 4 years ago

Same here:

Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate shorah
#
# To deactivate an active environment, use
#
#     $ conda deactivate

bag@bag:~$ . activate shorah
(shorah) bag@bag:~$ shorah --version
Traceback (most recent call last):
  File "/home/bag/miniconda3/envs/shorah/lib/python3.6/site-packages/shorah/cli.py", line 50, in <module>
    __version__ = get_distribution('shorah').version
  File "/home/bag/miniconda3/envs/shorah/lib/python3.6/site-packages/pkg_resources/__init__.py", line 482, in get_distribution
    dist = get_provider(dist)
  File "/home/bag/miniconda3/envs/shorah/lib/python3.6/site-packages/pkg_resources/__init__.py", line 358, in get_provider
    return working_set.find(moduleOrReq) or require(str(moduleOrReq))[0]
  File "/home/bag/miniconda3/envs/shorah/lib/python3.6/site-packages/pkg_resources/__init__.py", line 901, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/home/bag/miniconda3/envs/shorah/lib/python3.6/site-packages/pkg_resources/__init__.py", line 787, in resolve
    raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'shorah' distribution was not found and is required by the application

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/bag/miniconda3/envs/shorah/bin/shorah", line 11, in <module>
    from shorah.cli import main
  File "/home/bag/miniconda3/envs/shorah/lib/python3.6/site-packages/shorah/cli.py", line 53, in <module>
    with open(os.path.join(base_dir, '.version'), 'r') as version_file:
FileNotFoundError: [Errno 2] No such file or directory: '/home/bag/miniconda3/envs/shorah/.version'
bgruening commented 4 years ago

It seems to work in the conda root dir, but this is not how it should work :)

DrYak commented 4 years ago

Note: @pedrofale and @kpj are giving me a hand on this one.

kpj commented 4 years ago

This behavior is potentially fixed in https://github.com/cbg-ethz/shorah/pull/73.

bgruening commented 4 years ago

@bgruening cool, thanks. We can give it a new try as soon as there is a new release. Thanks.

DrYak commented 4 years ago

Update: current test package is passing CircleCI tests on both Linux and Mac OS X. Just need the colleagues to finish the code review and I can push the final package.

DrYak commented 4 years ago

@bgruening has merged the 1.99.1 bioconda package. It should be appearing on bioconda soon.

@heylf : could you give it a try again ?

bgruening commented 4 years ago

@DrYak you just need to bump https://github.com/galaxyproject/tools-iuc/blob/master/tools/shorah/shorah.xml#L5

Btw. do you have a list with new parameters or new inputs/outputs compared to the older version?

DrYak commented 4 years ago

For Galaxyproject/tools-iuc : I will check the documented procedure for testing and submitting changes.

WRT to parameters: calling ShoRAH has indeed changed somewhat since the older 1.x.x serie. Among others, there is now a single executable with multiple sub-commands instead of the older shotgun.py / amplian.py etc.

The most up-to-date list of parameters is ShoRAH's own help parameter:

# shorah -h
usage: shorah <subcommand> [options]

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit

sub-commands:
  {shotgun,amplicon,snv}
                        available sub-commands
    shotgun             run local analysis in shotgun mode
    amplicon            run local analysis in amplicon mode
    snv                 run single-nucleotide-variant calling

Run `shorah subcommand -h` for more help

shotgun is the subcommand that you're most likely to want implementing: (SNV calls on the whole genome and local haplotype in every window)

# shorah shotgun -h
usage: shorah <subcommand> [options] shotgun [-h] [-v] -b BAM -f REF
                                             [-a FLOAT] [-r chrm:start-stop]
                                             [-R INT] [-x INT] [-S FLOAT] [-I]
                                             [-p FLOAT]
                                             [-of {csv,vcf} [{csv,vcf} ...]]
                                             [-c INT] [-w INT] [-s INT] [-k]
                                             [-t INT]

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit
  -a FLOAT, --alpha FLOAT
                        alpha in dpm sampling (controls the probability of
                        creating new classes)
  -r chrm:start-stop, --region chrm:start-stop
                        region in format 'chr:start-stop', e.g.
                        'chrm:1000-3000'
  -R INT, --seed INT    set seed for reproducible results
  -x INT, --maxcov INT  approximate max coverage allowed
  -S FLOAT, --sigma FLOAT
                        sigma value to use when calling SNVs
  -I, --ignore_indels   ignore SNVs adjacent to insertions/deletions (legacy
                        behaviour of 'fil', ignore this option if you don't
                        understand)
  -p FLOAT, --threshold FLOAT
                        pos threshold when calling variants from support files
  -of {csv,vcf} [{csv,vcf} ...], --out_format {csv,vcf} [{csv,vcf} ...]
                        output format of called SNVs
  -c INT, --win_coverage INT
                        coverage threshold. Omit windows with low coverage
  -w INT, --windowsize INT
                        window size
  -s INT, --winshifts INT
                        number of window shifts
  -k, --keep_files      keep all intermediate files
  -t INT, --threads INT
                        limit maximum number of parallel sampler threads (0:
                        CPUs count-1, n: limit to n)

required arguments:
  -b BAM, --bam BAM     sorted bam format alignment file
  -f REF, --fasta REF   reference genome in fasta format

the amplicon mode:

# shorah amplicon -h
usage: shorah <subcommand> [options] amplicon [-h] [-v] -b BAM -f REF
                                              [-a FLOAT] [-r chrm:start-stop]
                                              [-R INT] [-x INT] [-S FLOAT]
                                              [-I] [-p FLOAT]
                                              [-of {csv,vcf} [{csv,vcf} ...]]
                                              [-c INT] [-d] [-m FLOAT]

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit
  -a FLOAT, --alpha FLOAT
                        alpha in dpm sampling (controls the probability of
                        creating new classes)
  -r chrm:start-stop, --region chrm:start-stop
                        region in format 'chr:start-stop', e.g.
                        'chrm:1000-3000'
  -R INT, --seed INT    set seed for reproducible results
  -x INT, --maxcov INT  approximate max coverage allowed
  -S FLOAT, --sigma FLOAT
                        sigma value to use when calling SNVs
  -I, --ignore_indels   ignore SNVs adjacent to insertions/deletions (legacy
                        behaviour of 'fil', ignore this option if you don't
                        understand)
  -p FLOAT, --threshold FLOAT
                        pos threshold when calling variants from support files
  -of {csv,vcf} [{csv,vcf} ...], --out_format {csv,vcf} [{csv,vcf} ...]
                        output format of called SNVs
  -c INT, --win_coverage INT
                        coverage threshold. Omit windows with low coverage
  -d, --diversity       detect the highest entropy region and run there
  -m FLOAT, --min_overlap FLOAT
                        fraction of read overlap to be included

required arguments:
  -b BAM, --bam BAM     sorted bam format alignment file
  -f REF, --fasta REF   reference genome in fasta format

to re-call SNV from already computed local haplotypes: (it is called internally at the end of either shotgun or amplicon. Though both of those are capable of skipping calls to dpm_sampler for windows for which they find already computed local haplotype).

# shorah snv -h
usage: shorah <subcommand> [options] snv [-h] [-v] -b BAM -f REF [-a FLOAT]
                                         [-r chrm:start-stop] [-R INT]
                                         [-x INT] [-S FLOAT] [-I] [-p FLOAT]
                                         [-of {csv,vcf} [{csv,vcf} ...]]
                                         [-i INT]

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit
  -a FLOAT, --alpha FLOAT
                        alpha in dpm sampling (controls the probability of
                        creating new classes)
  -r chrm:start-stop, --region chrm:start-stop
                        region in format 'chr:start-stop', e.g.
                        'chrm:1000-3000'
  -R INT, --seed INT    set seed for reproducible results
  -x INT, --maxcov INT  approximate max coverage allowed
  -S FLOAT, --sigma FLOAT
                        sigma value to use when calling SNVs
  -I, --ignore_indels   ignore SNVs adjacent to insertions/deletions (legacy
                        behaviour of 'fil', ignore this option if you don't
                        understand)
  -p FLOAT, --threshold FLOAT
                        pos threshold when calling variants from support files
  -of {csv,vcf} [{csv,vcf} ...], --out_format {csv,vcf} [{csv,vcf} ...]
                        output format of called SNVs
  -i INT, --increment INT
                        value of increment to use when calling SNVs (1 used in
                        amplicon mode)

required arguments:
  -b BAM, --bam BAM     sorted bam format alignment file
  -f REF, --fasta REF   reference genome in fasta format