biolink / ontobio

python library for working with ontologies and ontology associations
https://ontobio.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
118 stars 30 forks source link

Add missing chardet dependency for PyShEx module #578

Closed dustine32 closed 2 years ago

dustine32 commented 2 years ago

Currently this issue is to deal with this failure in go-site GH actions:

+ validate.py rule --metadata metadata/ --ontology /tmp/test-2021-07-13T164315/go-ontology.json --out /tmp/test-2021-07-13T164315/out.json 
Traceback (most recent call last): 
 File "/usr/local/bin/validate.py", line 32, in <module> 
 from ontobio.rdfgen.gocamgen.gocam_builder import GoCamBuilder, AssocExtractor 
 File "/usr/local/lib/python3.6/dist-packages/ontobio/rdfgen/gocamgen/gocam_builder.py", line 1, in <module> 
 from ontobio.rdfgen.gocamgen.gocamgen import AssocGoCamModel 
 File "/usr/local/lib/python3.6/dist-packages/ontobio/rdfgen/gocamgen/gocamgen.py", line 16, in <module> 
 from ontobio.rdfgen.gocamgen.triple_pattern_finder import TriplePattern, TriplePatternFinder 
 File "/usr/local/lib/python3.6/dist-packages/ontobio/rdfgen/gocamgen/triple_pattern_finder.py", line 1, in <module> 
 from ontobio.rdfgen.gocamgen.utils import contract_uri_wrapper 
 File "/usr/local/lib/python3.6/dist-packages/ontobio/rdfgen/gocamgen/utils.py", line 11, in <module> 
 from pyshexc.parser_impl import generate_shexj 
 File "/usr/local/lib/python3.6/dist-packages/pyshexc/parser_impl/generate_shexj.py", line 9, in <module> 
 import chardet
ModuleNotFoundError: No module named 'chardet' 
+ validate_exit=1 
+ cat /tmp/test-2021-07-13T164315/out.json 
cat: /tmp/test-2021-07-13T164315/out.json: No such file or directory 
+ rm -rf /tmp/test-2021-07-13T164315 
+ exit 1 
Error: Process completed with exit code 1.

From what I can tell, this is due to a dependency chardet not being installed through PyShEx's dependency tree via pyshexc. Simply adding chardet to requirements.txt should fix this for us. As of v0.8.2, pyshexc doesn't support python3.6 so not sure if submitting a ticket to them would help matter.

Still pondering whether we should pin PyShEx to 0.7.11 or retain >=0.7.11. It use to be pinned until https://github.com/biolink/ontobio/commit/d6a6a0695a6f42ec908ae7e748dc5aacc3141bee so I'm a little uneasy with reverting it back, especially if just adding chardet works for us now. @kltm Thoughts?

kltm commented 2 years ago

@dustine32 @sierra-moxon I'm somewhat neutral on this. In general, I'm pretty happy w/pinning, as variations on this kind of breakage will just keep happening without it. Moreover, I think adding chardet qualifies as a "hack" fix--using something that should not be necessary (and may be redundant in the future) because of a bug elsewhere. On the other hand, I'm also aware that other developers in our sphere are not wanting to pin, which leaves us with the opposite problem of "falling behind" and needing to fix issues caused by that. In the end, without v/centralized control, I think Pick Your Doom is what we're dealing with. If I had to gamble, I'd say that adding chardet w/a link to this ticket and a comment explaining why would cause the least negative ripples. Of course, in both cases, we're literally talking about a line of import or config, so I don't think it's worth worrying about too too much.

dustine32 commented 2 years ago

Totally agree @kltm!

Though, in my head, I imagine pinning as still able to cause problems as long as the dependent modules don't also pin their dependencies. For ex:

ontobio reqs.txt:
    PyShEx==0.7.11

PyShEx==0.7.11 reqs.txt:
    pyshexc>=0.5.4

Then pyshexc could release a new version like 0.7.0 at some later point with some bug that could break our code. For instance, the new pyshexc code could import chardet w/o including chardet in pyshexc's reqs.txt.

Does my confusing example make sense? I could be wrong in my assumption about how dependencies work.

kltm commented 2 years ago

That's a good example of one way why us pinning without coordination could potentially cause (and has in the past) caused problems. Part of this is down to package managers and development tools not really being up to the task of dealing with expansive and independently run codes bases and library mixes.

dustine32 commented 2 years ago

Closing since we appear to have solved this issue.