HALFpipe / ImputationProtocol

The ENIGMA Imputation Protocol as a container for your local workstation or high-performance compute cluster
3 stars 4 forks source link

ERROR: Application or file imputationserver not found. #1

Closed SheaCheng2000 closed 1 year ago

SheaCheng2000 commented 2 years ago

Hi,

Thanks for your imputation tool, and it really helps!!

I have done all steps in docker according to this protocol, but at the last step

imputationserver --study-name cga56 --population EAS

Error occurred, and the log was:

mkdir -p -v /data/output/cga56

cloudgene run imputationserver --conf /data/hadoop/config --refpanel 1000g-phase-3-v5 --population EAS --files /data/input/cga56 --output /data/output/cga56 --user --show-log --show-out

Cloudgene 2.4.1 http://www.cloudgene.io (c) 2009-2019 Lukas Forer and Sebastian Schoenherr Built by null on null Built by null on null

SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/cloudgene/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

ERROR: Application or file imputationserver not found.

ERROR: command exited with nonzero status 1

I have checked that the input files are okay. When I tried this command alone

cloudgene run imputationserver

the error was also:

Cloudgene 2.4.1 http://www.cloudgene.io (c) 2009-2019 Lukas Forer and Sebastian Schoenherr Built by null on null Built by null on null

SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/cloudgene/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

ERROR: Application or file imputationserver not found.

Could you please tell me how to solve this?

BTW, I tried to do imputation on online server before, but at Michigan Imputation Server it just keeps having the error "More than 100 obvious strand flips have been detected. Please check strand. Imputation cannot be started!" I used the "check flip" command in this protocol, the results showed 0 variants were flipped. So it seems my original input was actually normal? I have no idea about this.

Thanks!!

HippocampusGirl commented 2 years ago

Hi @SheaCheng2000,

Could you also send me the output from when you ran setup-hadoop and setup-imputationserver?

SheaCheng2000 commented 2 years ago

Hi @HippocampusGirl ,

Thanks for your quick reply!

I re-ran the setup-hadoop, the log is: (seems normal) setup-hadoop.log

When I re-ran setup-imputationserver , I found I did not successfully download the 1000genomes-phase3.zip, which may explain the error before. After downloading it, however, the command setup-imputationserver always stucked (I've tried many times), I copied the output here:


root@c0caea3eb299:/# setup-imputationserver
--------------------
mkdir -p -v /data/cloudgene /data/downloads
--------------------
--------------------
cp -rnv /opt/cloudgene/apps.yaml /opt/cloudgene/cloudgene /opt/cloudgene/cloudgene-daemon /opt/cloudgene/cloudgene.conf /opt/cloudgene/cloudgene.jar /opt/cloudgene/config /opt/cloudgene/lib /opt/cloudgene/sample /opt/cloudgene/tmp /opt/cloudgene/webapp /data/cloudgene/
--------------------
--------------------
cloudgene verify-cluster

Cloudgene 2.4.1
http://www.cloudgene.io
(c) 2009-2019 Lukas Forer and Sebastian Schoenherr
Built by null on null
Built by null on null

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/cloudgene/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Init HadoopUtil null

null

ERROR: command exited with nonzero status 1
--------------------
--------------------
wget --continue -O /data/downloads/imputationserver.zip https://github.com/genepi/imputationserver/releases/download/v1.6.8/imputationserver.zip
--2022-09-14 02:10:31--  https://github.com/genepi/imputationserver/releases/download/v1.6.8/imputationserver.zip
Resolving github.com (github.com)... 20.205.243.166
Connecting to github.com (github.com)|20.205.243.166|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/64924140/cb7fb9dc-947c-4ec2-accc-f3ab9e74496c?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20220914%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20220914T021034Z&X-Amz-Expires=300&X-Amz-Signature=2eb29fb0309d9acdeb250fdc57fa4d8e2a12459b6a0fdf6bb065974da278bebc&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=64924140&response-content-disposition=attachment%3B%20filename%3Dimputationserver.zip&response-content-type=application%2Foctet-stream [following]
--2022-09-14 02:10:34--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/64924140/cb7fb9dc-947c-4ec2-accc-f3ab9e74496c?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20220914%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20220914T021034Z&X-Amz-Expires=300&X-Amz-Signature=2eb29fb0309d9acdeb250fdc57fa4d8e2a12459b6a0fdf6bb065974da278bebc&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=64924140&response-content-disposition=attachment%3B%20filename%3Dimputationserver.zip&response-content-type=application%2Foctet-stream
Resolving objects.githubusercontent.com (objects.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.108.133, ...
Connecting to objects.githubusercontent.com (objects.githubusercontent.com)|185.199.110.133|:443... connected.
Unable to establish SSL connection.
ERROR: command exited with nonzero status 4
--------------------
--------------------
wget --continue -O /data/downloads/1000genomes-phase3.zip https://imputationserver.sph.umich.edu/static/downloads/releases/1000genomes-phase3-3.0.0.zip
--2022-09-14 02:10:34--  https://imputationserver.sph.umich.edu/static/downloads/releases/1000genomes-phase3-3.0.0.zip
Resolving imputationserver.sph.umich.edu (imputationserver.sph.umich.edu)... 141.211.29.100
Connecting to imputationserver.sph.umich.edu (imputationserver.sph.umich.edu)|141.211.29.100|:443... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 14248870960 (13G), 0 remaining [application/zip]
Saving to: ‘/data/downloads/1000genomes-phase3.zip’

100%[+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++] 14,248,870,960 --.-K/s   in 0s

2022-09-14 02:10:36 (0.00 B/s) - ‘/data/downloads/1000genomes-phase3.zip’ saved [14248870960/14248870960]

--------------------
--------------------
cloudgene install /data/downloads/imputationserver.zip

Cloudgene 2.4.1
http://www.cloudgene.io
(c) 2009-2019 Lukas Forer and Sebastian Schoenherr
Built by null on null
Built by null on null

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/cloudgene/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Installing application /data/downloads/imputationserver.zip...
[ERROR] Application not installed:java.io.IOException: Application imputationserver@1.6.8 is already installed

ERROR: command exited with nonzero status 1
--------------------
--------------------
cloudgene install /data/downloads/1000genomes-phase3.zip

Cloudgene 2.4.1
http://www.cloudgene.io
(c) 2009-2019 Lukas Forer and Sebastian Schoenherr
Built by null on null
Built by null on null

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/cloudgene/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Installing application /data/downloads/1000genomes-phase3.zip...

It stuck here for a long time, so I used ctrl+c to quit

^C
ERROR: command exited with nonzero status 130
--------------------
--------------------
sed -i "s/value: auto/value: password/g" /data/cloudgene/apps/imputationserver/*/*.yaml
--------------------
--------------------
find /data/cloudgene/apps -type d -exec chmod ugo+w {} +
--------------------
--------------------
find /data/cloudgene/apps -type f -exec chmod ugo-w {} +
--------------------
--------------------
mkdir -p -v /data/input /data/output
--------------------

Then I tried to run imputation server: imputationserver --study-name cga56 --population EAS It reported error: Missing argument for option: user

root@c0caea3eb299:/# imputationserver --study-name cga56 --population EAS
--------------------
mkdir -p -v /data/output/cga56
--------------------
--------------------
cloudgene run imputationserver --conf /data/hadoop/config --refpanel 1000g-phase-3-v5 --population EAS --files /data/input/cga56 --output /data/output/cga56 --user --show-log --show-out

Cloudgene 2.4.1
http://www.cloudgene.io
(c) 2009-2019 Lukas Forer and Sebastian Schoenherr
Built by null on null
Built by null on null

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/cloudgene/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

Genotype Imputation (Minimac4) 1.6.8
https://imputationserver.readthedocs.io

ERROR: Missing argument for option: user

usage: input parameters:
    --aesEncryption <checkbox>   AES 256 encryption
                                 (default: no)
    --build <build>              Array Build
                                 hg38: GRCh38/hg38
                                 hg19: GRCh37/hg19
                                 (default: hg19)
    --conf <arg>                 Hadoop configuration folder
    --files <local_folder>       Input Files (<a
                                 href="http://www.1000genomes.org/wiki/Ana
                                 lysis/Variant%20Call%20Format/vcf-variant
                                 -call-format-version-41"
                                 target="_blank">VCF</a>)
    --force                      Force Cloudgene to reinstall application
                                 in HDFS even if it already installed.
    --meta <checkbox>            Generate Meta-imputation file
                                 (default: no)
    --mode <mode>                Mode
                                 qconly: Quality Control Only
                                 imputation: Quality Control & Imputation
                                 phasing: Quality Control & Phasing Only
                                 (default: imputation)
    --output <arg>               Output folder
    --phasing <phasing>          Phasing
                                 no_phasing: No phasing
                                 eagle: Eagle v2.4 (phased output)
                                 (default: eagle)
    --population <population>    Population
                                 bind: refpanel
                                 property: populations
                                 category: RefPanel
    --r2Filter <r2Filter>        rsq Filter
                                 0: off
                                 0.1: 0.1
                                 0.2: 0.2
                                 0.3: 0.3
                                 0.001: 0.001
                                 (default: 0)
    --refpanel <app_list>        Reference Panel (<a
                                 href="https://imputationserver.sph.umich.
                                 edu/start.html#!pages/refpanels"
                                 target="_blank">Details</a>)
    --show-log                   Stream logging messages to stdout
    --show-output                Stream output to stdout
    --user <arg>                 Hadoop username [default: cloudgene]

ERROR: command exited with nonzero status 1
--------------------

Then I ran this command with user added: cloudgene run imputationserver --conf /data/hadoop/config --refpanel 1000g-phase-3-v5 --population EAS --files /data/input/cga56 --output /data/output/cga56 --user cloudgene --show-log --show-out

Still failed, the log is: cloudgene_run_imputationserver.log

Thanks a lot!!

Shea

HippocampusGirl commented 2 years ago

The installation step of the imputationserver (where you used Ctrl-C to quit) can take up to half an hour, depending on how fast your hard drives are. Could you use top or htop (or similar) to see which specific processes are running that are making the installation be so slow?

The second error is related to the command not being able to contact the Hadoop instance.

Call From c0caea3eb299/172.17.0.3 to localhost:8020 failed on connection exception: java.net.ConnectException

Could you double-check that the Hadoop processes were still running in the background when you started the cloudgene run?

SheaCheng2000 commented 2 years ago

Thanks! I have checked these processes as you advised, but I think something still went wrong with my device and I am working on it...

Though the imputation-server could not start, I could get the post QC vcf according to your protocol and I used the vcf as input in online TOPMED imputation server.

There is another question I'd like to ask: in the post-QC vcf, I noticed that all wild-type alleles in my data were removed. (e.g. in the .bim file "14 rs985931 19.46236 25105341 0 A" ) Does it mean that the wild-type alleles are not needed for the imputation process? If so, can I understand that the larger sample size brings more accurate results?

Thanks again!

HippocampusGirl commented 2 years ago

That makes sense, Hadoop can be a bit finicky. I am currently on vacation until October 17th, but will have a more detailed look when I'm back.

According to the documentation at https://imputationserver.readthedocs.io/en/latest/pipeline/, the Michigan Imputation Server only accepts A, C, G and T alleles, so that's not something we can change

SheaCheng2000 commented 2 years ago

Haha, thanks!! Have a nice vacation :)

HippocampusGirl commented 2 years ago

Hi @SheaCheng2000,

Would you be available for a short video call to show me the error?

HippocampusGirl commented 1 year ago

As there has not been any activity for a while, I'm assuming this has been resolved. Please re-open the issue if not :-)