Knowledge-Graph-Hub / kg-covid-19

An instance of KG Hub to produce a knowledge graph for COVID-19 response.
https://github.com/Knowledge-Graph-Hub/kg-covid-19/wiki
BSD 3-Clause "New" or "Revised" License
78 stars 26 forks source link

Jenkins build failing in Blazegraph deploy stage #439

Closed justaddcoffee closed 2 years ago

justaddcoffee commented 2 years ago

Describe the bug

Jenkins build is failing at the Blazegraph deploy stage, see here:

22:27:57  TASK [update-endpoint : Remove last journal download] **************************
22:27:57  ok: [pan.lbl.gov]
22:27:57  
22:27:57  TASK [update-endpoint : Remove last journal download unpacked] *****************
22:29:18  changed: [pan.lbl.gov]
22:29:18  
22:29:18  TASK [update-endpoint : Get remote blazegraph] *********************************
22:33:54  changed: [pan.lbl.gov]
22:33:54  
22:33:54  TASK [update-endpoint : Unpack the journal] ************************************
22:36:46  changed: [pan.lbl.gov]
22:36:46  
22:36:46  TASK [update-endpoint : supervisorctl] *****************************************
22:36:46  ok: [pan.lbl.gov]
22:36:46  
22:36:46  TASK [update-endpoint : Remove last old journal] *******************************
22:36:46  ok: [pan.lbl.gov]
22:36:46  
22:36:46  TASK [update-endpoint : Move old journal] **************************************
22:36:46  fatal: [pan.lbl.gov]: FAILED! => {"changed": true, "cmd": ["mv", "/tmp/blazegraph-kg-hub-internal_new.jnl", "/home/ubuntu/kg-hub-graphstore-server-internal/blazegraph.jnl"], "delta": "0:00:00.004741", "end": "2022-01-02 22:36:39.521431", "msg": "non-zero return code", "rc": 1, "start": "2022-01-02 22:36:39.516690", "stderr": "mv: cannot move '/tmp/blazegraph-kg-hub-internal_new.jnl' to '/home/ubuntu/kg-hub-graphstore-server-internal/blazegraph.jnl': No such file or directory", "stderr_lines": ["mv: cannot move '/tmp/blazegraph-kg-hub-internal_new.jnl' to '/home/ubuntu/kg-hub-graphstore-server-internal/blazegraph.jnl': No such file or directory"], "stdout": "", "stdout_lines": []}
22:36:46  
22:36:46  PLAY RECAP *********************************************************************
22:36:46  pan.lbl.gov                : ok=7    changed=3    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0   
22:36:46  

To Reproduce

See here

Expected behavior

Jenkins build should complete

Version

https://github.com/Knowledge-Graph-Hub/kg-covid-19/commit/8779f52d7b2e6aff59a6ccc5df069ef1078e37c0

justaddcoffee commented 2 years ago

@kltm any idea what's happening here?

kltm commented 2 years ago

@justaddcoffee Check slack. The short of it is: https://github.com/geneontology/operations/issues/51

justaddcoffee commented 2 years ago

A relevant ticket, h/t Seth: https://github.com/geneontology/operations/issues/51

caufieldjh commented 2 years ago

Removing the ssh-keyscan in https://github.com/Knowledge-Graph-Hub/kg-covid-19/commit/d047fd7b8dd5593fc06fe3f315c84a753ac4946f looks like it still leaves pan as the endpoint, and it's definitely unreachable:

15:28:54  + HOME=/var/lib/jenkins/workspace/dge-graph-hub_kg-covid-19_master/ansible
15:28:54  + ansible-playbook update-kg-hub-endpoint.yaml --inventory=hosts.local-rdf-endpoint --private-key=**** -e target_user=bbop --extra-vars=endpoint=internal
15:28:55  [DEPRECATION WARNING]: The TRANSFORM_INVALID_GROUP_CHARS settings is set to 
15:28:55  allow bad characters in group names by default, this will change, but still be 
15:28:55  user configurable on deprecation. This feature will be removed in version 2.10.
15:28:55   Deprecation warnings can be disabled by setting deprecation_warnings=False in 
15:28:55  ansible.cfg.
15:28:55  [WARNING]: Invalid characters were found in group names but not replaced, use
15:28:55  -vvvv to see details
15:28:55  
15:28:55  PLAY [pipeline-rdf] ************************************************************
15:28:55  
15:28:55  TASK [Gathering Facts] *********************************************************
15:28:55  [WARNING]: Unhandled error in Python interpreter discovery for host
15:28:55  pan.lbl.gov: Failed to connect to the host via ssh: Host key verification
15:28:55  failed.
15:28:55  fatal: [pan.lbl.gov]: UNREACHABLE! => {"changed": false, "msg": "Data could not be sent to remote host \"pan.lbl.gov\". Make sure this host can be reached over ssh: Host key verification failed.\r\n", "unreachable": true}
15:28:55  
15:28:55  PLAY RECAP *********************************************************************
15:28:55  pan.lbl.gov                : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0   
15:28:55  
[Pipeline] }
15:28:56  ERROR: script returned exit code 4
kltm commented 2 years ago

@caufieldjh I'd hazard a guess that you need to update the metadata that the playbook is running--the pan instances are no longer running and haven't for some time (https://github.com/geneontology/operations/issues/51).

caufieldjh commented 2 years ago

Here: https://github.com/Knowledge-Graph-Hub/kg-covid-19/blob/522ebfcd9be4a3e3855151bbad207de942e74f41/Jenkinsfile#L233 Change hosts.local to hosts.remote and change user to ubuntu

caufieldjh commented 2 years ago

I believe this is resolved now - the Feb 28 build appears to have completed without issue.

justaddcoffee commented 2 years ago

Thanks @caufieldjh!