ease-crc / soma

The Socio-physical Model of Activities (SOMA) is a formal activity model for embodied agents that need to operate their body to generate motions that cause intentional effects in the physical and social world.
GNU Lesser General Public License v3.0
16 stars 22 forks source link

ELAN vocabulary generation script throws a SAXParseException #304

Open abdelker opened 1 year ago

abdelker commented 1 year ago

Hello,

I am trying to run the [./] ecv.sh script (-branch master -folder scripts). However, it fails to parse the owl files.

Here is the error I get:

Traceback (most recent call last):
  File "/usr/lib/python3.8/xml/sax/expatreader.py", line 217, in feed
    self._parser.Parse(data, isFinal)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 1, column 6

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/abdelker/catkin_ws/src/soma/scripts/elan_cv.py", line 231, in <module>
    ontoP, namespaces = parseXMLOnto(onto)
  File "/home/abdelker/catkin_ws/src/soma/scripts/elan_cv.py", line 152, in parseXMLOnto
    ontoP = untangle.parse(onto)
  File "/home/abdelker/.local/lib/python3.8/site-packages/untangle.py", line 205, in parse
    parser.parse(filename)
  File "/usr/lib/python3.8/xml/sax/expatreader.py", line 111, in parse
    xmlreader.IncrementalParser.parse(self, source)
  File "/usr/lib/python3.8/xml/sax/xmlreader.py", line 125, in parse
    self.feed(buffer)
  File "/usr/lib/python3.8/xml/sax/expatreader.py", line 221, in feed
    self._err_handler.fatalError(exc)
  File "/usr/lib/python3.8/xml/sax/handler.py", line 38, in fatalError
    raise exception
xml.sax._exceptions.SAXParseException: /home/abdelker/catkin_ws/src/soma/scripts/../owl/SOMA.owl:1:6: not well-formed (invalid token)
mpomarlan commented 1 year ago

This seems related to switching SOMA's format to owl functional syntax. The script assumed RDF/XML and was not updated in a while.

I can take care of this later in the week, or more likely next week. As a quick workaround, Protege allows saving SOMA.owl as RDF/XML and then the script should run with that as input.

mrnolte commented 1 year ago

Can you explain to me what the ELAN scripts are doing? We might want to move these to the java CI as well.

mpomarlan commented 1 year ago

The ELAN script generates vocabulary files.

In more detail, it loops through the concepts in SOMA looking for whether a particular annotation property (ELANName I think) is defined for a concept. If it is, then an entry is generated into a controlled vocabulary file.

Later, when someone uses ELAN to annotate, they rely on having such vocabulary files to provide labels for the annotations.

Moving the functionality of the script to the Java CI is a good idea since Java has all the library support for OWL formats.

mrnolte commented 1 year ago

@ayden175 I know that you want to work on other stuff, but would you mind checking this problem out? If not, I can also have a look at changing the CI.