Knowledge-Graph-Hub / kg-obo

A package to transform all OBO ontologies into KGX TSV format and OBO json, and put the transformed graph in KGhub
https://knowledge-graph-hub.github.io/kg-obo/getting_started.html
GNU General Public License v3.0
28 stars 2 forks source link

Update robot to newer version #207

Open caufieldjh opened 1 year ago

caufieldjh commented 1 year ago

It would be nice to use newer versions of robot, but...

In #206, I saw that versions of robot newer than v1.8.3 appeared to break the transformation process, at least with BFO in the tests.

The test failure:

_________________________________________________________________________________________ TestRunTransform.test_run_transform _________________________________________________________________________________________

self = <tests.test_transform.TestRunTransform testMethod=test_run_transform>, mock_clean_and_normalize_graph = <MagicMock name='clean_and_normalize_graph' id='139816367273056'>
mock_kgx_transform = <MagicMock name='transform' id='139816374101184'>, mock_get_owl_iri = <MagicMock name='get_owl_iri' id='139816367248288'>, mock_base_url = <MagicMock name='get_url' id='139816374532368'>
mock_retrieve_obofoundry_yaml = <MagicMock name='retrieve_obofoundry_yaml' id='139816374109808'>, mock_get = <MagicMock name='get' id='139815297222832'>

    @mock.patch('requests.get')
    @mock.patch('kg_obo.transform.retrieve_obofoundry_yaml')
    @mock.patch('kg_obo.obolibrary_utils.get_url')
    @mock.patch('kg_obo.transform.get_owl_iri', return_value=('http://purl.obolibrary.org/obo/bfo/2019-08-26/bfo.owl', '2019-08-26', 'versionIRI'))
    @mock.patch('kgx.cli.transform')
    @mock.patch('kg_obo.transform.clean_and_normalize_graph')
    def test_run_transform(self, mock_clean_and_normalize_graph, mock_kgx_transform,
                           mock_get_owl_iri, mock_base_url,
                           mock_retrieve_obofoundry_yaml, mock_get):
        mock_retrieve_obofoundry_yaml.return_value = [{'id': 'bfo'}]

        # Test with s3_test option on
        with tempfile.TemporaryDirectory() as td:
>           run_transform(log_dir=td,s3_test=True)

tests/test_transform.py:149: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
kg_obo/transform.py:1096: in run_transform
    if not convert_owl(
kg_obo/robot_utils.py:135: in convert_owl
    sed(['-i',
kg-obo-env/lib/python3.9/site-packages/sh.py:1524: in __call__
    rc = self.__class__.RunningCommandCls(cmd, call_args, stdin, stdout, stderr)
kg-obo-env/lib/python3.9/site-packages/sh.py:750: in __init__
    self.wait()
kg-obo-env/lib/python3.9/site-packages/sh.py:812: in wait
    self.handle_command_exit_code(exit_code)

and the output:

Setting up ROBOT...
ROBOT path: /home/harry/kg-obo/robot
ROBOT evironment variables: -Xmx12g -XX:+UseG1GC
Testing S3 only, so assuming lock is not set.
Testing S3 only - mock checking lock status.
Mock searching kg-obo/tracking.yaml in bucket
Will simulate uploading using the following paths:
* bucket: bucket
* remote path: kg-obo
* local path: data
bfo
<MagicMock name='get_url()' id='139815864793168'>
Current VersionIRI for bfo: http://purl.obolibrary.org/obo/bfo/2019-08-26/bfo.owl
Current version for bfo: 2019-08-26
In bfo, used this value for version: versionIRI
Don't have this version of bfo yet - will transform.
Could not parse OWL definitions enough to locate any imports.
No imports found for bfo.
Completed download from <MagicMock name='get_url()' id='139815864793168'> to /tmp/bfogiry5gew.
Moving from /tmp/bfogiry5gew to data/bfo/2019-08-26/bfo.owl.
ROBOT preprocessing: relax bfo
Relaxing /tmp/bfogiry5gew to /tmp/tmp7gnejkmp_bfo_relaxed.owl...
Complete.
Before relax: 0 lines. After relax: 15 lines.
ROBOT preprocessing: node ID normalization on bfo
Retrieving entity names in /tmp/tmp7gnejkmp_bfo_relaxed.owl...
Exported IDs to /tmp/tmp7gnejkmp_bfo_relaxed.owl.ids.csv.
All identifiers in /tmp/tmp7gnejkmp_bfo_relaxed.owl are as expected.
No identifiers in /tmp/tmp7gnejkmp_bfo_relaxed.owl will be normalized.
ROBOT preprocessing: convert bfo
Converting /tmp/tmp7gnejkmp_bfo_relaxed.owl to data/bfo/2019-08-26/bfo.json...
ROBOT encountered an error: 

  RAN: /home/harry/kg-obo/robot convert --input /tmp/tmp7gnejkmp_bfo_relaxed.owl --format json --output data/bfo/2019-08-26/bfo.json

  STDOUT:
OBO GRAPH ERROR Could not convert ontology to OBO Graph (see https://github.com/geneontology/obographs)
For details see: http://robot.obolibrary.org/errors#obo-graph-error
Use the -vvv option to show the stack trace.
Use the --help option to see usage information.

  STDERR:

Will try to repair...
ROBOT encountered another error: 

  RAN: /home/harry/kg-obo/robot remove --input /tmp/tmp7gnejkmp_bfo_relaxed.owl --select object-properties --output data/bfo/2019-08-26/bfo.json

  STDOUT:
OBO GRAPH ERROR Could not convert ontology to OBO Graph (see https://github.com/geneontology/obographs)
For details see: http://robot.obolibrary.org/errors#obo-graph-error
Use the -vvv option to show the stack trace.
Use the --help option to see usage information.

  STDERR:

ROBOT encountered yet another error: 

  RAN: /home/harry/kg-obo/robot remove --input /tmp/tmp7gnejkmp_bfo_relaxed.owl --term rdfs:comment --output data/bfo/2019-08-26/bfo.json

  STDOUT:
OBO GRAPH ERROR Could not convert ontology to OBO Graph (see https://github.com/geneontology/obographs)
For details see: http://robot.obolibrary.org/errors#obo-graph-error
Use the -vvv option to show the stack trace.
Use the --help option to see usage information.

  STDERR:

Replacing any invalid prefixes...
------------------------------------------------------------------------------------------------ Captured stderr call -------------------------------------------------------------------------------------------------
  0%|          | 0.00/1.00k [00:00<?, ?B/s] [00:00<?, ?it/s]
  0%|          | 0.00/1.00 [00:00<?, ?B/s]]
processing ontologies:   0%|          | 0/1 [00:03<?, ?it/s]
-------------------------------------------------------------------------------------------------- Captured log call --------------------------------------------------------------------------------------------------
INFO     kg-obo:transform.py:844 Will test S3 upload instead of actually uploading.
INFO     kg-obo:transform.py:849 Loading bfo
INFO     kg-obo:transform.py:895 Current VersionIRI for bfo: http://purl.obolibrary.org/obo/bfo/2019-08-26/bfo.owl
INFO     kg-obo:transform.py:897 Current version for bfo: 2019-08-26
INFO     kg-obo:transform.py:899 In bfo, used this value for version: versionIRI
INFO     kg-obo:transform.py:938 Don't have this version of bfo yet - will transform.
INFO     kg-obo:transform.py:956 No imports found for bfo.
INFO     kg-obo:transform.py:988 Completed download from <MagicMock name='get_url()' id='139815864793168'> to /tmp/bfogiry5gew.
INFO     kg-obo:transform.py:990 Moving from /tmp/bfogiry5gew to data/bfo/2019-08-26/bfo.owl.
INFO     kg-obo:transform.py:1002 ROBOT preprocessing: relax bfo
INFO     kg-obo:transform.py:1019 Before relax: 0 lines. After relax: 15 lines.
caufieldjh commented 1 year ago

Could be something to do with env vars being passed to robot?

The product of relaxing BFO in the test is fully broken - it looks like this:

<?xml version="1.0"?>
<rdf:RDF xmlns="http://www.w3.org/2002/07/owl#"
     xml:base="http://www.w3.org/2002/07/owl"
     xmlns:owl="http://www.w3.org/2002/07/owl#"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:xml="http://www.w3.org/XML/1998/namespace"
     xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
    <Ontology/>
</rdf:RDF>

<!-- Generated by the OWL API (version 4.5.6) https://github.com/owlcs/owlapi -->

Note the tag isn't even present. Could be some weird edge case?