cancerDHC / ccdh-terminology-service

CCDH Terminology and Mapping Service
3 stars 4 forks source link

Bug: Importer: `TypeError: ...extended_str...not supported` #115

Closed joeflack4 closed 2 years ago

joeflack4 commented 2 years ago

Description

When running python -m ccdh.importers.importer, there is an error.

Related issues

https://github.com/py2neo-org/interchange/issues/4 https://github.com/linkml/linkml-runtime/issues/64 https://github.com/cancerDHC/ccdh-terminology-service/issues/115

Error messages

Short err

TypeError: Values of type <class 'linkml_runtime.utils.yamlutils.extended_str'> are not supported

Long err

Traceback (most recent call last):
 File “/usr/local/lib/python3.8/runpy.py”, line 194, in _run_module_as_main
  return _run_code(code, main_globals, None,
 File “/usr/local/lib/python3.8/runpy.py”, line 87, in _run_code
  exec(code, run_globals)
 File “/app/ccdh/importers/importer.py”, line 221, in <module>
  Importer.import_all()
 File “/app/ccdh/importers/importer.py”, line 215, in import_all
  Importer(neo4j_graph()).import_harmonized_attributes(CrdcHImporter.read_harmonized_attributes())
 File “/app/ccdh/importers/importer.py”, line 65, in import_harmonized_attributes
  self.import_harmonized_attribute(harmonized_attribute)
 File “/app/ccdh/importers/importer.py”, line 105, in import_harmonized_attribute
  tx.create(subgraph)
 File “/usr/local/lib/python3.8/site-packages/py2neo/database.py”, line 1063, in create
  create(self)
 File “/usr/local/lib/python3.8/site-packages/py2neo/data.py”, line 200, in __db_create__
  records = tx.run(*pq)
 File “/usr/local/lib/python3.8/site-packages/py2neo/database.py”, line 987, in run
  result = self._connector.run(self.ref, cypher, parameters)
 File “/usr/local/lib/python3.8/site-packages/py2neo/client/__init__.py”, line 1424, in run
  return cx.run(tx, cypher, parameters)
 File “/usr/local/lib/python3.8/site-packages/py2neo/client/bolt.py”, line 571, in run
  return self._run(tx.graph_name, cypher, parameters or {})
 File “/usr/local/lib/python3.8/site-packages/py2neo/client/bolt.py”, line 923, in _run
  response = self.append_message(0x10, cypher, parameters, extra or {})
 File “/usr/local/lib/python3.8/site-packages/py2neo/client/bolt.py”, line 726, in append_message
  self.write_message(tag, fields)
 File “/usr/local/lib/python3.8/site-packages/py2neo/client/bolt.py”, line 701, in write_message
  self._writer.write_message(tag, fields)
 File “/usr/local/lib/python3.8/site-packages/py2neo/client/bolt.py”, line 240, in write_message
  packer.pack(field)
 File “/usr/local/lib/python3.8/site-packages/interchange/packstream.py”, line 151, in pack
  self._pack_dict(value)
 File “/usr/local/lib/python3.8/site-packages/interchange/packstream.py”, line 285, in _pack_dict
  self.pack(item)
 File “/usr/local/lib/python3.8/site-packages/interchange/packstream.py”, line 120, in pack
  self._pack_list(value)
 File “/usr/local/lib/python3.8/site-packages/interchange/packstream.py”, line 247, in _pack_list
  self.pack(item)
 File “/usr/local/lib/python3.8/site-packages/interchange/packstream.py”, line 151, in pack
  self._pack_dict(value)
 File “/usr/local/lib/python3.8/site-packages/interchange/packstream.py”, line 285, in _pack_dict
  self.pack(item)
 File “/usr/local/lib/python3.8/site-packages/interchange/packstream.py”, line 200, in pack
  raise TypeError(“Values of type %s are not supported” % type(value))

Source of issue

1. First thesis

_Edit: We were looking at the wrong error line. This isn't a %s issue; this is an extended_str and Neo4J issue_ ~The Neo4J lib is trying to import data, and it is trying to do some string interpolation using %s, but the strings that it is trying to do this on are actually of a special LinkML class called extended_str which has multiple inheritance w/ both str and another LinkML class called TypedNode.~

~If I had to guess, something broke during inheritance where the functionality that controls %s (would it be the __str__() method?) broke.~

2. Second thesis

Neo4J seems to reject extended_str.

Possible solutions

1. Rollback LinkML versions (temporary)

This is a temporary solution, because ideally I think we want to be kept up-to-date with the latest LinkML versions. Theoretically, we should be able to use the last stable LinkML versions, but I have thus far not had success here.

I looked on the live server where terminology.ccdh.io is hosted, and checked versions used:

(venv) [docker@ip-172-31-44-92 ccdh-terminology-service]$ pip freeze | grep linkml
biolinkml==1.7.6
linkml==1.0.1
linkml-runtime==1.0.5

Then, in my local installation of the terminology service, I uninstalled linkml and linkml_runtime and modified requirements.txt to use these versions. However, surprisingly there was a dependency mismatch conflict, and pip was unable to install the versions I requested based on the linkml_runtime versions required by the corresponding versions of linkml that I was trying to install. I tried several different combinations; here are the results from just two of the attempts:

ERROR: Cannot install linkml-runtime==1.0.6 and linkml==1.0.1 because these package versions have conflicting dependencies.

The conflict is caused by:
    The user requested linkml-runtime==1.0.6
    linkml 1.0.1 depends on linkml-runtime>=1.0.10 and ~=1.0

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies
ERROR: Cannot install -r requirements.txt (line 74), -r requirements.txt (line 75), -r requirements.txt (line 78) and linkml-runtime==1.0.5 because these package versions have conflicting dependencies.

The conflict is caused by:
    The user requested linkml-runtime==1.0.5
    linkml-model 1.0.0 depends on linkml-runtime>=1.0.3 and ~=1.0
    linkml-runtime-api 0.0.4 depends on linkml-runtime
    linkml 1.0.0 depends on linkml-runtime>=1.0.8 and ~=1.0

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies

This is pretty surprising. I'm not 100% sure what the source of the issue is, but the first thing that comes to mind is that the packages on PyPi for these given versions are not actually those that existed before when this is working for us. That is, my guess is that it's possible that the versions we were using were deleted from PyPi and then updated versions of the packages were re-uploaded under those same version numbers. This would explain why we were able to install this combination of versions before and why we're not able to do so now. But maybe there is some other explanation.

2. Update linkml_runtime

We could fix the issue at the source. First we have to identify the actual source of the issue, be it in enhanced_str, TypedNode, or otherwise.

3. Wrap around linkml_runtime classes

We could make a custom class that inherits around yaml_loader or other classes, and then see if we can override whatever is breaking %s and fix it there. And then we use that custom class(s) instead.

4. Modify or create new object based on obj returned from yaml_loader

We can recurse through the object returned by the yaml_loader and make fixes necessary. This could potentially be as simple as doing str(enhanced_str_variable) for every nested instance of enhanced_str in the object.

5. Neo4J compatibility

Any way to get Neo4J to accept enhanced_str?

joeflack4 commented 2 years ago

@wdduncan For your reference.

joeflack4 commented 2 years ago

@wdduncan Should be fixed now. Give it a go!

As for what I did, we were a bit too hasty at first with our conclusion. It wasn't that %s functionality was broken with the enhanced_str class. We were just looking at the wrong line in the stack trace. We should have looked at:

TypeError: Values of type <class 'linkml_runtime.utils.yamlutils.extended_str'> are not supported

instead of: raise TypeError(“Values of type %s are not supported” % type(value))

Here's the fix. Basically we had already been converting the YAMLRoot version of the model into a dict. I converted all instances of enhanced_str to str by doing json.loads(json.dumps()).

        model: YAMLRoot = yaml_loader.loads(yaml, target_class=YAMLRoot)
        native_class_dict: Dict = model.classes._as_dict
        standard_class_dict: Dict = json.loads(json.dumps(native_class_dict))  # <------
        class_values = standard_class_dict.values()
        for cls in class_values:
            ...