Closed Toseph closed 5 years ago
Hi @Toseph , Please ignore the "Use of uninitialized value in hash element " message. VEP should still complete successfully. Let me know if you're also seeing a different error.
How long should it take for a single VCF upload typically? My machine has 8 cores, and 32Gb of memory, and my VCF file is only 226Mb.
Stage 15 seems to be done, goes to stage 20, but it goes back to stage 6 and then 9.
See below,
VARS> line 1.
[Stage 15:=====================================================>(981 + 6) / 987]Use of uninitialized value in hash element at /usr/local/seqr/seqr/vep/ensembl-tools-release-85/scripts/variant_effect_predictor/Bio/EnsEMBL/Variation/Utils/VEP.pm line 4255, <VARS> line 1.
[Stage 15:=====================================================>(982 + 5) / 987]Use of uninitialized value in hash element at /usr/local/seqr/seqr/vep/ensembl-tools-release-85/scripts/variant_effect_predictor/Bio/EnsEMBL/Variation/Utils/VEP.pm line 4255, <VARS> line 1.
[Stage 15:=====================================================>(983 + 4) / 987]Use of uninitialized value in hash element at /usr/local/seqr/seqr/vep/ensembl-tools-release-85/scripts/variant_effect_predictor/Bio/EnsEMBL/Variation/Utils/VEP.pm line 4255, <VARS> line 1.
[Stage 15:=====================================================>(985 + 2) / 987]Use of uninitialized value in hash element at /usr/local/seqr/seqr/vep/ensembl-tools-release-85/scripts/variant_effect_predictor/Bio/EnsEMBL/Variation/Utils/VEP.pm line 4255, <VARS> line 1.
[Stage 15:=====================================================>(986 + 1) / 987]2019-06-26 13:35:04 Hail: INFO: vep: annotated 4909438 variants
[Stage 20:> (0 + 8) / 987]SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
[Stage 20:> (7 + 8) / 987]quet.hadoop.ColumnChunkPageWriteStore: written 1,313B for [annotations, vep, variant_class] BINARY: 5,018 values, 1,268B raw, 1,273B comp, 1 pages, encodings: [BIT_PACKED, RLE, PLAIN_DICTIONARY], dic { 3 entries, 32B raw, 3B comp}
[Stage 20:> (8 + 8) / 987]tations, vep, transcript_consequences, list, element, impact] BINARY: 32,339 values, 5,781B raw, 2,684B comp, 1 pages, encodings: [RLE, PLAIN_DICTIONARY], dic { 4 entries, 39B raw, 4B comp}
Jun 26, 2019 1:35:14 PM INFO: org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 3,148B for [annotations, vep, transcript_consequences, list, element, uniparc] BINARY: 11,685 values, 10,609B raw, 3,091B comp, 1 pages, encodings: [RLE, PLAIN_DICTI[Stage 20:> (9 + 8) / 987], 130B comp}
[Stage 20:> (11 + 8) / 987]itten 5,067B for [annotations, vep, transcript_consequences, list, element, uniparc] BINARY: 19,917 values, 19,206B raw, 5,009B comp, 1 pages, encodings: [RLE, PLAIN_DICTIONARY], dic { 224 entries, 3,808B raw, 224B comp}
Jun 26, 2019 1:35:15 PM INFO: org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 6,937B for [annotations, vep, transcr[Stage 20:> (15 + 8) / 987]N_DICTIONARY], dic { 306 entries, 5,814B raw, 306B comp}
[Stage 20:> (16 + 8) / 987]quet.hadoop.ColumnChunkPageWriteStore: written 1,823B for [annotations, vep, transcript_consequences, list, element, lof] BINARY: 39,895 values, 5,421B raw, 1,796B comp, 1 pages, encodings: [PLAIN, RLE]
[Stage 20:=> (18 + 8) / 987]ritten 802B for [annotations, vep, transcript_consequences, list, element, protein_end] INT32: 13,488 values, 2,127B raw, 763B comp, 1 pages, encodings: [RLE, PLAIN_DICTIONARY], dic { 40 entries, 160B raw, 40B comp}
[Stage 20:=> (20 + 8) / 987]eStore: written 2,005B for [annotations, vep, transcript_consequences, list, element, strand] INT32: 18,760 values, 3,756B raw, 1,967B comp, 1 pages, encodings: [RLE, PLAIN_DICTIONARY], dic { 2 entries, 8B raw, 2B comp}
Jun 26, 2019 1:35:18 PM INFO: org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 2,211B for [annotations, vep, transcr[Stage 20:=> (22 + 8) / 987]odings: [RLE, PLAIN_DICTIONARY], dic { 148 entries, 2,812B raw, 148B comp}
Jun 26, 2019 1:35:18 PM INFO: org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 867B for [annotations, vep, transcrip[Stage 20:=> (23 + 8) / 987]sequences, list, element, lof_flags] BINARY: 21,649 values, 2,798B raw, 840B comp, 1 pages, encodings: [PLAIN, RLE]
[Stage 20:=> (24 + 8) / 987]adoop.ColumnChunkPageWrit
[Stage 20:=====================================================>(973 + 8) / 987]957B raw, 386B comp, 1 pages, encodings: [RLE, PLAIN_DICTIONARY], dic { 3 entries, 24B raw, 3B comp}
[Stage 20:=====================================================>(974 + 8) / 987]INFO: org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 1,233B for [annotations, vep, transcript_consequences, list, element, hgvs_offset] INT32: 6,626 values, 1,665B raw, 1,196B comp, 1 pages, encodings: [RLE, PLAIN_DICTIONARY], dic { 40 entries, 160B raw, 40B comp}
[Stage 20:=====================================================>(977 + 8) / 987]riteStore: written 611B for [annotations, vep, transcript_consequences, list, element, hgvsp] BINARY: 7,150 values, 839B raw, 514B comp, 1 pages, encodings: [PLAIN, RLE]
Jun 26, 2019 1:39:44 PM INFO: org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 3,874B for [annotations, vep, transcr[Stage 20:=====================================================>(980 + 7) / 987]anscript_id] BINARY: 9,642 values, 8,506B raw, 3,813B comp, 1 pages, encodings: [RLE, PLAIN_DICTIONARY], dic { 308 entries, 5,852B raw, 308B comp}
Jun 26, 2019 1:39:46 PM INFO: org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 541B for [annotations, vep, transcrip[Stage 20:=====================================================>(981 + 6) / 987]p, 1 pages, encodings: [RLE, PLAIN_DICTIONARY], dic { 7 entries, 77B raw, 7B comp}
[Stage 20:=====================================================>(985 + 2) / 987]unkPageWriteStore: written 406B for [annotations, vep, transcript_consequences, list, element, protein_end] INT32: 5,356 values, 428B raw, 369B comp, 1 pages, encodings: [PLAIN, RLE]
[Stage 20:=====================================================>(986 + 1) / 987]ns, vep, transcript_consequences, list, element, hgnc_id] BINARY: 8,091 values, 3,033B raw, 968B comp, 1 pages, encodings: [RLE, PLAIN_DICTIONARY], dic { 65 entries, 569B raw, 65B comp}
[Stage 3:=====================================================>(991 + 8) / 1000]2019-06-26 13:41:52 Hail: WARN: called redundant split on an already split VDS
[Stage 6:=====================================================>(999 + 1) / 1000]Struct{
[Stage 9:==> (49 + 8) / 1000]
My run error'd out after running for over 24 hours after it reached pipeline step 2 to upload to elasticsearch.
Should I have tabix available in the path to ignore the VARS warning? I'm not really sure where to look here, but I have a 56Mb log of the upload and what steps it tried to take.
Just checked and tabix is in my PATH as a result of the install steps for seqr.
This is the current error I am getting after step 2 begins
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 166, in perform_request
response = self.pool.urlopen(method, url, body, retries=False, headers=request_headers, **kw)
File "/usr/lib/python2.7/site-packages/urllib3/connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "/usr/lib/python2.7/site-packages/urllib3/util/retry.py", line 344, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/usr/lib/python2.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/usr/lib/python2.7/site-packages/urllib3/connectionpool.py", line 354, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/lib64/python2.7/httplib.py", line 1041, in request
self._send_request(method, url, body, headers)
File "/usr/lib64/python2.7/httplib.py", line 1075, in _send_request
self.endheaders(body)
File "/usr/lib64/python2.7/httplib.py", line 1037, in endheaders
self._send_output(message_body)
File "/usr/lib64/python2.7/httplib.py", line 881, in _send_output
self.send(msg)
File "/usr/lib64/python2.7/httplib.py", line 843, in send
self.connect()
File "/usr/lib/python2.7/site-packages/urllib3/connection.py", line 181, in connect
conn = self._new_conn()
File "/usr/lib/python2.7/site-packages/urllib3/connection.py", line 168, in _new_conn
self, "Failed to establish a new connection: %s" % e)
NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7fe7c1174810>: Failed to establish a new connection: [Errno 111] Connection refused
2019-06-28 08:00:31,943 WARNING GET http://localhost:9200/ [status:N/A request:0.000s]
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 166, in perform_request
response = self.pool.urlopen(method, url, body, retries=False, headers=request_headers, **kw)
File "/usr/lib/python2.7/site-packages/urllib3/connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "/usr/lib/python2.7/site-packages/urllib3/util/retry.py", line 344, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/usr/lib/python2.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/usr/lib/python2.7/site-packages/urllib3/connectionpool.py", line 354, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/lib64/python2.7/httplib.py", line 1041, in request
self._send_request(method, url, body, headers)
File "/usr/lib64/python2.7/httplib.py", line 1075, in _send_request
Hey just wanted to say thanks for all the input on this ticket. We have seqr working on a new VM and it turned out to be a mix of python dependencies and elasticsearch that had to be worked through (ES version too high, and certain python packages needed an available version vs. specific ones).
I ultimately think the issue here is that I tried to deploy seqr as root instead of as user, on top of yum installing ES v7, so this caused issues with the ElasticSearch deployment that running it as a service did not resolve.
During the upload phase for step 5, I am running into an error with VEP while it attempts to run the file. Below are the steps I took to run the command and the second blurb is where the error shows up.
Error comes during step 15: