Closed SheaCheng2000 closed 1 year ago
Hi @SheaCheng2000,
Could you also send me the output from when you ran setup-hadoop
and setup-imputationserver
?
Hi @HippocampusGirl ,
Thanks for your quick reply!
I re-ran the setup-hadoop
, the log is: (seems normal)
setup-hadoop.log
When I re-ran setup-imputationserver
, I found I did not successfully download the 1000genomes-phase3.zip, which may explain the error before. After downloading it, however, the command setup-imputationserver
always stucked (I've tried many times), I copied the output here:
root@c0caea3eb299:/# setup-imputationserver
--------------------
mkdir -p -v /data/cloudgene /data/downloads
--------------------
--------------------
cp -rnv /opt/cloudgene/apps.yaml /opt/cloudgene/cloudgene /opt/cloudgene/cloudgene-daemon /opt/cloudgene/cloudgene.conf /opt/cloudgene/cloudgene.jar /opt/cloudgene/config /opt/cloudgene/lib /opt/cloudgene/sample /opt/cloudgene/tmp /opt/cloudgene/webapp /data/cloudgene/
--------------------
--------------------
cloudgene verify-cluster
Cloudgene 2.4.1
http://www.cloudgene.io
(c) 2009-2019 Lukas Forer and Sebastian Schoenherr
Built by null on null
Built by null on null
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/cloudgene/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Init HadoopUtil null
null
ERROR: command exited with nonzero status 1
--------------------
--------------------
wget --continue -O /data/downloads/imputationserver.zip https://github.com/genepi/imputationserver/releases/download/v1.6.8/imputationserver.zip
--2022-09-14 02:10:31-- https://github.com/genepi/imputationserver/releases/download/v1.6.8/imputationserver.zip
Resolving github.com (github.com)... 20.205.243.166
Connecting to github.com (github.com)|20.205.243.166|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/64924140/cb7fb9dc-947c-4ec2-accc-f3ab9e74496c?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20220914%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20220914T021034Z&X-Amz-Expires=300&X-Amz-Signature=2eb29fb0309d9acdeb250fdc57fa4d8e2a12459b6a0fdf6bb065974da278bebc&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=64924140&response-content-disposition=attachment%3B%20filename%3Dimputationserver.zip&response-content-type=application%2Foctet-stream [following]
--2022-09-14 02:10:34-- https://objects.githubusercontent.com/github-production-release-asset-2e65be/64924140/cb7fb9dc-947c-4ec2-accc-f3ab9e74496c?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20220914%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20220914T021034Z&X-Amz-Expires=300&X-Amz-Signature=2eb29fb0309d9acdeb250fdc57fa4d8e2a12459b6a0fdf6bb065974da278bebc&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=64924140&response-content-disposition=attachment%3B%20filename%3Dimputationserver.zip&response-content-type=application%2Foctet-stream
Resolving objects.githubusercontent.com (objects.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.108.133, ...
Connecting to objects.githubusercontent.com (objects.githubusercontent.com)|185.199.110.133|:443... connected.
Unable to establish SSL connection.
ERROR: command exited with nonzero status 4
--------------------
--------------------
wget --continue -O /data/downloads/1000genomes-phase3.zip https://imputationserver.sph.umich.edu/static/downloads/releases/1000genomes-phase3-3.0.0.zip
--2022-09-14 02:10:34-- https://imputationserver.sph.umich.edu/static/downloads/releases/1000genomes-phase3-3.0.0.zip
Resolving imputationserver.sph.umich.edu (imputationserver.sph.umich.edu)... 141.211.29.100
Connecting to imputationserver.sph.umich.edu (imputationserver.sph.umich.edu)|141.211.29.100|:443... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 14248870960 (13G), 0 remaining [application/zip]
Saving to: ‘/data/downloads/1000genomes-phase3.zip’
100%[+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++] 14,248,870,960 --.-K/s in 0s
2022-09-14 02:10:36 (0.00 B/s) - ‘/data/downloads/1000genomes-phase3.zip’ saved [14248870960/14248870960]
--------------------
--------------------
cloudgene install /data/downloads/imputationserver.zip
Cloudgene 2.4.1
http://www.cloudgene.io
(c) 2009-2019 Lukas Forer and Sebastian Schoenherr
Built by null on null
Built by null on null
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/cloudgene/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Installing application /data/downloads/imputationserver.zip...
[ERROR] Application not installed:java.io.IOException: Application imputationserver@1.6.8 is already installed
ERROR: command exited with nonzero status 1
--------------------
--------------------
cloudgene install /data/downloads/1000genomes-phase3.zip
Cloudgene 2.4.1
http://www.cloudgene.io
(c) 2009-2019 Lukas Forer and Sebastian Schoenherr
Built by null on null
Built by null on null
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/cloudgene/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Installing application /data/downloads/1000genomes-phase3.zip...
It stuck here for a long time, so I used ctrl+c to quit
^C
ERROR: command exited with nonzero status 130
--------------------
--------------------
sed -i "s/value: auto/value: password/g" /data/cloudgene/apps/imputationserver/*/*.yaml
--------------------
--------------------
find /data/cloudgene/apps -type d -exec chmod ugo+w {} +
--------------------
--------------------
find /data/cloudgene/apps -type f -exec chmod ugo-w {} +
--------------------
--------------------
mkdir -p -v /data/input /data/output
--------------------
Then I tried to run imputation server: imputationserver --study-name cga56 --population EAS
It reported error: Missing argument for option: user
root@c0caea3eb299:/# imputationserver --study-name cga56 --population EAS
--------------------
mkdir -p -v /data/output/cga56
--------------------
--------------------
cloudgene run imputationserver --conf /data/hadoop/config --refpanel 1000g-phase-3-v5 --population EAS --files /data/input/cga56 --output /data/output/cga56 --user --show-log --show-out
Cloudgene 2.4.1
http://www.cloudgene.io
(c) 2009-2019 Lukas Forer and Sebastian Schoenherr
Built by null on null
Built by null on null
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/cloudgene/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Genotype Imputation (Minimac4) 1.6.8
https://imputationserver.readthedocs.io
ERROR: Missing argument for option: user
usage: input parameters:
--aesEncryption <checkbox> AES 256 encryption
(default: no)
--build <build> Array Build
hg38: GRCh38/hg38
hg19: GRCh37/hg19
(default: hg19)
--conf <arg> Hadoop configuration folder
--files <local_folder> Input Files (<a
href="http://www.1000genomes.org/wiki/Ana
lysis/Variant%20Call%20Format/vcf-variant
-call-format-version-41"
target="_blank">VCF</a>)
--force Force Cloudgene to reinstall application
in HDFS even if it already installed.
--meta <checkbox> Generate Meta-imputation file
(default: no)
--mode <mode> Mode
qconly: Quality Control Only
imputation: Quality Control & Imputation
phasing: Quality Control & Phasing Only
(default: imputation)
--output <arg> Output folder
--phasing <phasing> Phasing
no_phasing: No phasing
eagle: Eagle v2.4 (phased output)
(default: eagle)
--population <population> Population
bind: refpanel
property: populations
category: RefPanel
--r2Filter <r2Filter> rsq Filter
0: off
0.1: 0.1
0.2: 0.2
0.3: 0.3
0.001: 0.001
(default: 0)
--refpanel <app_list> Reference Panel (<a
href="https://imputationserver.sph.umich.
edu/start.html#!pages/refpanels"
target="_blank">Details</a>)
--show-log Stream logging messages to stdout
--show-output Stream output to stdout
--user <arg> Hadoop username [default: cloudgene]
ERROR: command exited with nonzero status 1
--------------------
Then I ran this command with user added:
cloudgene run imputationserver --conf /data/hadoop/config --refpanel 1000g-phase-3-v5 --population EAS --files /data/input/cga56 --output /data/output/cga56 --user cloudgene --show-log --show-out
Still failed, the log is: cloudgene_run_imputationserver.log
Thanks a lot!!
Shea
The installation step of the imputationserver (where you used Ctrl-C to quit) can take up to half an hour, depending on how fast your hard drives are. Could you use top
or htop
(or similar) to see which specific processes are running that are making the installation be so slow?
The second error is related to the command not being able to contact the Hadoop instance.
Call From c0caea3eb299/172.17.0.3 to localhost:8020 failed on connection exception: java.net.ConnectException
Could you double-check that the Hadoop processes were still running in the background when you started the cloudgene
run?
Thanks! I have checked these processes as you advised, but I think something still went wrong with my device and I am working on it...
Though the imputation-server could not start, I could get the post QC vcf according to your protocol and I used the vcf as input in online TOPMED imputation server.
There is another question I'd like to ask: in the post-QC vcf, I noticed that all wild-type alleles in my data were removed. (e.g. in the .bim file "14 rs985931 19.46236 25105341 0 A" ) Does it mean that the wild-type alleles are not needed for the imputation process? If so, can I understand that the larger sample size brings more accurate results?
Thanks again!
That makes sense, Hadoop can be a bit finicky. I am currently on vacation until October 17th, but will have a more detailed look when I'm back.
According to the documentation at https://imputationserver.readthedocs.io/en/latest/pipeline/, the Michigan Imputation Server only accepts A, C, G and T alleles, so that's not something we can change
Haha, thanks!! Have a nice vacation :)
Hi @SheaCheng2000,
Would you be available for a short video call to show me the error?
As there has not been any activity for a while, I'm assuming this has been resolved. Please re-open the issue if not :-)
Hi,
Thanks for your imputation tool, and it really helps!!
I have done all steps in docker according to this protocol, but at the last step
Error occurred, and the log was:
I have checked that the input files are okay. When I tried this command alone
the error was also:
Could you please tell me how to solve this?
BTW, I tried to do imputation on online server before, but at Michigan Imputation Server it just keeps having the error "More than 100 obvious strand flips have been detected. Please check strand. Imputation cannot be started!" I used the "check flip" command in this protocol, the results showed 0 variants were flipped. So it seems my original input was actually normal? I have no idea about this.
Thanks!!