Open heylf opened 4 years ago
Hello, and thank you for posting your concerns.
Well, that is really weird, as the bioconda package is what we are currently using in production in V-pipe with SARA-CoV-2 sequencing data.
Something must have changed in the lastest miniconda version as I can't reproduce your error message on my production installation, but I do get the exact same message when attempting to use it using a clean installation in a docker of Ubuntu:stable.
I'll try to investigate it more in detail tomorrow.
Hello Heylf,
sorry for the slow answer, we're currently having some major computation trouble here, so I had less time to devote to your issue.
Meanwhile, upstream conda have again changed something because now the package works again and I am unable reproduce the problem in the Ubuntu:stable docker using the exact same sequence as last time. :-(
I'll try to investigate it as I get some free time aside from our other problems.
Can you give it a try on your side and tell me if you're still affected ?
And what platform are you using ? Linux installation running on bare metal? VM? Docker? WSL1/2 in windows 10? And which distribution ?
Hey DrYak, I tried it again, but it still fails. I am using Ubuntu 18.04.3, no VM or docker, with a new miniconda3 environment for shorah.
I tried it now to install it directly in the base of miniconda3 and it works. Seems like it does not work if you create an own env for shorah.
Same here:
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
# $ conda activate shorah
#
# To deactivate an active environment, use
#
# $ conda deactivate
bag@bag:~$ . activate shorah
(shorah) bag@bag:~$ shorah --version
Traceback (most recent call last):
File "/home/bag/miniconda3/envs/shorah/lib/python3.6/site-packages/shorah/cli.py", line 50, in <module>
__version__ = get_distribution('shorah').version
File "/home/bag/miniconda3/envs/shorah/lib/python3.6/site-packages/pkg_resources/__init__.py", line 482, in get_distribution
dist = get_provider(dist)
File "/home/bag/miniconda3/envs/shorah/lib/python3.6/site-packages/pkg_resources/__init__.py", line 358, in get_provider
return working_set.find(moduleOrReq) or require(str(moduleOrReq))[0]
File "/home/bag/miniconda3/envs/shorah/lib/python3.6/site-packages/pkg_resources/__init__.py", line 901, in require
needed = self.resolve(parse_requirements(requirements))
File "/home/bag/miniconda3/envs/shorah/lib/python3.6/site-packages/pkg_resources/__init__.py", line 787, in resolve
raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'shorah' distribution was not found and is required by the application
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/bag/miniconda3/envs/shorah/bin/shorah", line 11, in <module>
from shorah.cli import main
File "/home/bag/miniconda3/envs/shorah/lib/python3.6/site-packages/shorah/cli.py", line 53, in <module>
with open(os.path.join(base_dir, '.version'), 'r') as version_file:
FileNotFoundError: [Errno 2] No such file or directory: '/home/bag/miniconda3/envs/shorah/.version'
It seems to work in the conda root dir, but this is not how it should work :)
Note: @pedrofale and @kpj are giving me a hand on this one.
This behavior is potentially fixed in https://github.com/cbg-ethz/shorah/pull/73.
@bgruening cool, thanks. We can give it a new try as soon as there is a new release. Thanks.
Update: current test package is passing CircleCI tests on both Linux and Mac OS X. Just need the colleagues to finish the code review and I can push the final package.
@bgruening has merged the 1.99.1 bioconda package. It should be appearing on bioconda soon.
@heylf : could you give it a try again ?
@DrYak you just need to bump https://github.com/galaxyproject/tools-iuc/blob/master/tools/shorah/shorah.xml#L5
Btw. do you have a list with new parameters or new inputs/outputs compared to the older version?
For Galaxyproject/tools-iuc : I will check the documented procedure for testing and submitting changes.
WRT to parameters: calling ShoRAH has indeed changed somewhat since the older 1.x.x serie.
Among others, there is now a single executable with multiple sub-commands instead of the older shotgun.py
/ amplian.py
etc.
The most up-to-date list of parameters is ShoRAH's own help parameter:
# shorah -h
usage: shorah <subcommand> [options]
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
sub-commands:
{shotgun,amplicon,snv}
available sub-commands
shotgun run local analysis in shotgun mode
amplicon run local analysis in amplicon mode
snv run single-nucleotide-variant calling
Run `shorah subcommand -h` for more help
shotgun
is the subcommand that you're most likely to want implementing:
(SNV calls on the whole genome and local haplotype in every window)
# shorah shotgun -h
usage: shorah <subcommand> [options] shotgun [-h] [-v] -b BAM -f REF
[-a FLOAT] [-r chrm:start-stop]
[-R INT] [-x INT] [-S FLOAT] [-I]
[-p FLOAT]
[-of {csv,vcf} [{csv,vcf} ...]]
[-c INT] [-w INT] [-s INT] [-k]
[-t INT]
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
-a FLOAT, --alpha FLOAT
alpha in dpm sampling (controls the probability of
creating new classes)
-r chrm:start-stop, --region chrm:start-stop
region in format 'chr:start-stop', e.g.
'chrm:1000-3000'
-R INT, --seed INT set seed for reproducible results
-x INT, --maxcov INT approximate max coverage allowed
-S FLOAT, --sigma FLOAT
sigma value to use when calling SNVs
-I, --ignore_indels ignore SNVs adjacent to insertions/deletions (legacy
behaviour of 'fil', ignore this option if you don't
understand)
-p FLOAT, --threshold FLOAT
pos threshold when calling variants from support files
-of {csv,vcf} [{csv,vcf} ...], --out_format {csv,vcf} [{csv,vcf} ...]
output format of called SNVs
-c INT, --win_coverage INT
coverage threshold. Omit windows with low coverage
-w INT, --windowsize INT
window size
-s INT, --winshifts INT
number of window shifts
-k, --keep_files keep all intermediate files
-t INT, --threads INT
limit maximum number of parallel sampler threads (0:
CPUs count-1, n: limit to n)
required arguments:
-b BAM, --bam BAM sorted bam format alignment file
-f REF, --fasta REF reference genome in fasta format
the amplicon mode:
# shorah amplicon -h
usage: shorah <subcommand> [options] amplicon [-h] [-v] -b BAM -f REF
[-a FLOAT] [-r chrm:start-stop]
[-R INT] [-x INT] [-S FLOAT]
[-I] [-p FLOAT]
[-of {csv,vcf} [{csv,vcf} ...]]
[-c INT] [-d] [-m FLOAT]
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
-a FLOAT, --alpha FLOAT
alpha in dpm sampling (controls the probability of
creating new classes)
-r chrm:start-stop, --region chrm:start-stop
region in format 'chr:start-stop', e.g.
'chrm:1000-3000'
-R INT, --seed INT set seed for reproducible results
-x INT, --maxcov INT approximate max coverage allowed
-S FLOAT, --sigma FLOAT
sigma value to use when calling SNVs
-I, --ignore_indels ignore SNVs adjacent to insertions/deletions (legacy
behaviour of 'fil', ignore this option if you don't
understand)
-p FLOAT, --threshold FLOAT
pos threshold when calling variants from support files
-of {csv,vcf} [{csv,vcf} ...], --out_format {csv,vcf} [{csv,vcf} ...]
output format of called SNVs
-c INT, --win_coverage INT
coverage threshold. Omit windows with low coverage
-d, --diversity detect the highest entropy region and run there
-m FLOAT, --min_overlap FLOAT
fraction of read overlap to be included
required arguments:
-b BAM, --bam BAM sorted bam format alignment file
-f REF, --fasta REF reference genome in fasta format
to re-call SNV from already computed local haplotypes:
(it is called internally at the end of either shotgun or amplicon. Though both of those are capable of skipping calls to dpm_sampler
for windows for which they find already computed local haplotype).
# shorah snv -h
usage: shorah <subcommand> [options] snv [-h] [-v] -b BAM -f REF [-a FLOAT]
[-r chrm:start-stop] [-R INT]
[-x INT] [-S FLOAT] [-I] [-p FLOAT]
[-of {csv,vcf} [{csv,vcf} ...]]
[-i INT]
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
-a FLOAT, --alpha FLOAT
alpha in dpm sampling (controls the probability of
creating new classes)
-r chrm:start-stop, --region chrm:start-stop
region in format 'chr:start-stop', e.g.
'chrm:1000-3000'
-R INT, --seed INT set seed for reproducible results
-x INT, --maxcov INT approximate max coverage allowed
-S FLOAT, --sigma FLOAT
sigma value to use when calling SNVs
-I, --ignore_indels ignore SNVs adjacent to insertions/deletions (legacy
behaviour of 'fil', ignore this option if you don't
understand)
-p FLOAT, --threshold FLOAT
pos threshold when calling variants from support files
-of {csv,vcf} [{csv,vcf} ...], --out_format {csv,vcf} [{csv,vcf} ...]
output format of called SNVs
-i INT, --increment INT
value of increment to use when calling SNVs (1 used in
amplicon mode)
required arguments:
-b BAM, --bam BAM sorted bam format alignment file
-f REF, --fasta REF reference genome in fasta format
I tried to install shorah via bioconda (conda install -c biconda shorah) and executing shorah will give the following error:
`pkg_resources.DistributionNotFound: The 'shorah' distribution was not found and is required by the application
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File .../miniconda3/envs/shorah/bin/shorah", line 11, in
from shorah.cli import main
File ".../miniconda3/envs/shorah/lib/python3.6/site-packages/shorah/cli.py", line 53, in
with open(os.path.join(base_dir, '.version'), 'r') as version_file:
FileNotFoundError: [Errno 2] No such file or directory: '..../miniconda3/envs/shorah/.version'
`