WorldModelers / Ontologies

Ontologies for the World Modelers system
Creative Commons Attribution 4.0 International
6 stars 11 forks source link

rewrites strip_metadata script to be more robust to order #128

Closed zupon closed 3 years ago

zupon commented 3 years ago

This PR rewrites the strip_metadata.py script to make it simpler and more robust to variable order of metadata fields. It now does not rely on OntologyNode: being first nor on polarity (or semantic type more recently) being last.

kwalcock commented 3 years ago

If it gets the right answer, it's close enough for me. I'd think that python has a yaml library so that one wouldn't need to regex. It seems like there is the possibility of finding some of these patterns in strings that some evil person has planted within the file. If both these patterns need to match from the beginning of a line, perhaps it would be prudent to add a ^ to save some debugging later (if that's at all how the python regexes work).

zupon commented 3 years ago

Just looking at some less-curated output from the OIAD UI, this needs some more work. The problem is I was trying to use the - asdf: format to look for non-leaf nodes, but in the randomized order of the output, it now gets all sorts of things that shouldn't be gotten.

BeckySharp commented 3 years ago

we should make a test for this once it's solved.

On Wed, Jul 7, 2021 at 9:11 AM zupon @.***> wrote:

External Email

Just looking at some less-curated output from the OIAD UI, this needs some more work. The problem is I was trying to use the - asdf: format to look for non-leaf nodes, but in the randomized order of the output, it now gets all sorts of things that shouldn't be gotten.

— You are receiving this because your review was requested. Reply to this email directly, view it on GitHub https://github.com/WorldModelers/Ontologies/pull/128#issuecomment-875735129, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJCPCKAQQXCMN4CU4PFVTTTWR4BVANCNFSM475AV7UQ .

zupon commented 3 years ago

I used the pyyaml library to do some fancy stuff with this. Now, through some sort of magic, pyyaml reads in the metadata file, sorts the fields the way we want, and the rest of my script does the rest the way it did before. I can't guarantee it will work if we decide to add new types of fields, but it does work with the ones we have now, no matter what order they are in.

BeckySharp commented 3 years ago

nice! any chance we can add a unit test or two?

zupon commented 3 years ago

@kwalcock Your testing framework for Python seems to work well here. I revised strip_metadata.py to make it play nicer with tests, then wrote test_strip_metadata.py. Right now it only includes one relatively simple test, but it can be expanded later.

One area that I'm really unsure about is from my last commit. The Travis build wasn't succeeding because it could not find the pyyaml module. I added a line to .travis.yml to install it, and now it succeeds, but I wanted to make sure that wasn't going to totally mess everything up somewhere down the line.

kwalcock commented 3 years ago

@zupon, that seems like a perfectly good solution for now. There are apparently other conventions to know about, like use of setup.py. https://packaging.python.org/overview/ explains a lot. I don't know how formal it needs to get. Testing has been addressed in any case.

zupon commented 3 years ago

@kwalcock Is it alright to merge this PR? Or should I wait until the PR for your branch into master is merged first, since my branch now contains all of your changes?