bioinfo-chru-strasbourg / howard

Highly Open Workflow for Annotation & Ranking toward genomic variant Discovery
GNU Affero General Public License v3.0
6 stars 2 forks source link

Not able to download databases #240

Closed zuber-bioinfo closed 2 months ago

zuber-bioinfo commented 3 months ago

I am running below command for downloading databases as described,

$ howard databases --assembly=hg19 --download-genomes=~/howard/databases/genomes/current --genomes-folder=~/howard/databases/genomes/current --download-genomes-provider=UCSC --download-genomes-contig-regex='chr[0-9XYM]+$' --download-annovar=~/howard/databases/annovar/current --download-annovar-files='refGene,cosmic70,nci60' --download-snpeff=~/howard/databases/snpeff/current --download-refseq=~/howard/databases/refseq/current --download-refseq-format-file='ncbiRefSeq.txt' --download-dbnsfp=~/howard/databases/dbnsfp/current --download-dbnsfp-release='4.4a' --download-dbnsfp-subdatabases --download-alphamissense=~/howard/databases/alphamissense/current --download-exomiser=~/howard/databases/exomiser/current --download-dbsnp=~/howard/databases/dbsnp/current --download-dbsnp-vcf --threads=8

Error: howard databases: error: argument --genomes-folder: invalid Path value: '~/howard/databases/genomes/current'

In howard gui, there are two different paths shown, before downloading the databases, as seen in the picture

Screenshot 2024-06-25 141013

How to reset it and download databases ?

antonylebechec commented 3 months ago

Dear @zuber-bioinfo,

Thank you for your feedback.

The --genomes-folder argument is required for processing certain databases after they have been downloaded (e.g., dbNSFP, dbSNP). If the folder is not present (i.e., the genome has not already been downloaded), an error occurs.

The command you are using attempts to download the genome and other databases simultaneously, resulting in failure due to this dependency. To resolve this issue, please download the genome first using the following command:

howard databases --assembly=hg19 --download-genomes=~/howard/databases/genomes/current --threads=8

Afterwards, you can proceed with your command.

We will address this inconsistency in the next release to ensure smoother operation.

Once again, thank you for your feedback!

antonylebechec commented 3 months ago

Regarding the two different paths, this discrepancy is due to the resolution of the home (~) path.

zuber-bioinfo commented 3 months ago

This didn't work howard databases --assembly=hg19 --download-genomes=~/howard/databases/genomes/current --threads=8

Problem was in ~/howard/howard/tools/tools.py file, line 783, "..... PathType(exists=True, .....", I have made it False, then above command worked, then for other databases to download, I need to revert back to True. There should be None I guess.

Another Issue regarding download of snpeff, `#[2024-06-26 07:31:29] [INFO] Download snpEff databases ['hg19']

[2024-06-26 07:31:29] [ERROR] Download snpEff databases ['hg19'] - list of databases empty - check file '/home/ensembl/howard/databases/snpeff/current/snpeff_databases.list'

Traceback (most recent call last): File "/home/ensembl/miniconda3/envs/howard/bin/howard", line 33, in sys.exit(load_entry_point('howard', 'console_scripts', 'howard')()) File "/home/ensembl/howard/howard/main.py", line 273, in main eval(f"{command_function}(args)") File "", line 1, in File "/home/ensembl/howard/howard/tools/databases.py", line 230, in databases databases_download_snpeff( File "/home/ensembl/howard/howard/functions/databases.py", line 782, in databases_download_snpeff raise ValueError( ValueError: Download snpEff databases ['hg19'] - list of databases empty - check file '/home/ensembl/howard/databases/snpeff/current/snpeff_databases.list' `

antonylebechec commented 3 months ago

Dear @zuber-bioinfo,

Regarding issue with paths, you're right, parameter exists should be None. This will be fixed asap.

We need more information to fix the errors on snpEff. Can you:

Best,

antonylebechec commented 3 months ago

Dear @zuber-bioinfo,

A hotfix was pushed to the devel branch. Can you check if it works on your system?

I suggest using a conda environment, such as:

conda create --name howard_devel python=3.10
conda activate howard_devel
python -m pip install -e .
howard --help

Best,

zuber-bioinfo commented 3 months ago

Dear @zuber-bioinfo,

An hotfix were push on devel branch. Can you check if it works on your system? I suggest to use a conda environment, like:

conda create --name howard_devel python=3.10
conda activate howard_devel
python -m pip install -e .
howard --help

Best,

Thanks Antony, devel branch has no issue with genome folder/path, successfully downloaded hg19. Bur snpeff issue is there.

Regarding the installation of howard, OS: Ubuntu 22.04.04 Environment: conda and pip

conda create --name howard python=3.10.13
conda activate howard
git clone https://github.com/bioinfo-chru-strasbourg/howard
cd howard
python -m pip install -e .

Successfully installed with error of pip's dependency resolver as below, Screenshot 2024-06-27 101755 So, extra commands

git clone https://github.com/Clinical-Genomics/chanjo.git
cd chanjo/
conda install --channel bioconda sambamba
pip install -r requirements-dev.txt --editable .
cd ../howard/
python -m pip install -e .

For GUI installation using pip, building wxpython causes error, So installed its dependency using,

  $ sudo apt-get install dpkg-dev build-essential freeglut3-dev libgl1-mesa-dev libglu1-mesa-dev libgstreamer-plugins-base1.0-dev libgtk-3-dev libjpeg-dev libnotify-dev libpng-dev libsdl2-dev libsm-dev libtiff-dev libwebkit2gtk-4.0-dev libxtst-dev
  $ pip install -U -f https://extras.wxpython.org/wxPython4/extras/linux/gtk3/ubuntu-22.04 wxPython
  $ python -m pip install -r requirements-gui.txt

Sometime it throws error regarding typing-extensions and polars, so did,

$ pip install --upgrade typing-extensions
  $ pip install polars==0.20.20

db_tree.txt

antonylebechec commented 3 months ago

Dear @zuber-bioinfo,

Thanks for these additional details.

I’m happy to hear that there are no issues with the devel branch.

Regarding the errors during pip install, I'm not sure what is happening with these packages. They are not used within Howard and are not needed. For the Howard GUI, some specific system packages, such as wxpython, are sometimes required. The GUI has been tested only on macOS. Thanks for mentioning typing-extensions; I'll look into this.

Regarding snpEff, this tool needs to be installed and configured (refer to the documentation for configuration details). If this is already done, simply remove the file /home/ensembl/howard/databases/snpeff/current/snpeff_databases.list (I will try to fix this issue in the next release). The same applies to Annovar (but you already have the Annovar databases!).

I think the best way to install Howard for a full experience is to deploy it with Docker. This way, additional tools are installed and ready to use.

Let me know if you encounter any other issues!

Best,

antonylebechec commented 2 months ago

Dear @zuber-bioinfo,

A new release v0.11.0 add been published. Some fixes should solve issues you mentioned.

Thanks for your help!

Best,