Switch over to current rdflib

GoogleCodeExporter commented 9 years ago

Switch FuXi to using the latest version of rdflib that includes changes to the 
module structure, etc..

Original issue reported on code.google.com by chime...@gmail.com on 24 Sep 2009 at 1:26

GoogleCodeExporter commented 9 years ago

I think it is about the time to switch to rdflib v3+ and python 2.7+

how big is the effort to upgrade it? when will it be taken to that level?

Original comment by costezki...@gmail.com on 13 May 2011 at 1:31

GoogleCodeExporter commented 9 years ago

In case it's of interest, I propagated my rdflib3 refactoring of FuXi to the 
new 1.3 release. It's a relatively trivial refactoring that aims simply to 
maintain compatibility with layer cake rdflib 2.4.X. It doesn't address (and in 
fact is pretty much ignorant of) the module-level changes that Chimezie 
envisaged. Four/five tests are failing, indicative of further work required to 
get this up to production level-code.

http://code.google.com/r/gjhiggins-fuxi-rdflib3/source/list?name=rdflib3

FWIW.

Original comment by gjhigg...@gmail.com on 4 Oct 2011 at 5:04

GoogleCodeExporter commented 9 years ago

In response to Chimezie's post to the forum:

++ Excellent! I will take a look and try to get a sense of the effort 
++ it would take to take this to its conclusion: switching back to 
++ rdflib while maintaining the divergent module-changes and components
++ (such as the pure python parser, the Generic SPARQL Store, 
++ the MySQL/SPARQL implementation, etc.). Do you have any sense of this? 

I am able to report that all the above-referenced work has been restored, 
refactored and recently merged back into the default branch of my clone
of rdfextras, ready for pushing to the "official" repos:

http://code.google.com/r/gjhiggins-rdfextras/source/browse/#hg%2Frdfextras

There is a Hudson CI build which tracks commits:

http://bel-epa.com/hudson/job/rdfextras-test/

and which maintains reports of test runs, currently standing at 369 tests
with 4 failures and 13 skips (of known issues, mostly with SQL stores):

http://bel-epa.com/hudson/job/rdfextras-test/lastCompletedBuild/testReport/

and (fwiw) coverage reports:

http://bel-epa.com/hudson/job/rdfextras-test/Test_coverage_Report/

With respect to detail - the MySQL/SPARQL implementation is available as
rdfextras.sparql2sql and shows little or no difference in test results to 
the extant default rdfextras SPARQL implementation.

Most of the stores have been recovered and refactored but I'm unsure of
what you mean by "the Generic SPARQL Store" - the recovered stores are in:

http://code.google.com/r/gjhiggins-rdfextras/source/browse/#hg%2Frdfextras%2Fsto
re

Whilst the key-value stores required little change other than a mild
refactoring, the SQL stores are evincing problems when running tests that
involved contexts and Statements.

Many of the tests make assertions about the length of the graph but this
seems to be broken for contexts, as this Pdb interaction apparently
demonstrates (if the comment formatting screws up the layout, I'll 
attach a text file):

python run_tests.py --pdb-failure 
test/test_store/test_sqlite.py:SQLiteContextTestCase.testLenInMultipleContexts
Running nose with: --pdb-failure 
test/test_store/test_sqlite.py:SQLiteContextTestCase.testLenInMultipleContexts 
--attr=!performancetest --where=./ --with-doctest --doctest-extension=.doctest 
--doctest-tests
> /usr/lib/python2.7/unittest/case.py(496)_baseAssertEqual()
-> raise self.failureException(msg)
(Pdb) u
> /usr/lib/python2.7/unittest/case.py(503)assertEqual()
-> assertion_func(first, second, msg=msg)
(Pdb) u
> ~rdfextras/test/test_store/test_context.py(146)testLenInMultipleContexts()
-> self.assertEquals(len(self.graph), oldLen + 1)
(Pdb) oldLen
0
(Pdb) self.graph.serialize()
*** Exception: Can't split 'hates'
(Pdb) self.graph.serialize(format="n3")
'\n<pizza> <hates> <tarek> .\n\n'
(Pdb) len(self.graph)
3
(Pdb) self.assertEquals(len([y for y in self.graph.triples((None, None, 
None))]), oldLen + 1)

The failure to serialize the test statements as XML is rather inconvenient
and perhaps even a bug.

Still, even with the limitation of several significant test failures, 
it is possible to run FuXi's test suite with rdflib 3.2 dev and the 
"restoration" rdfextras clone.

Again, there is a Hudson CI build:

http://bel-epa.com/hudson/job/fuxi-rdflib3/

similarly tracking commits and maintaining reports of test runs, currently
standing at 87 tests and 31 failures

http://bel-epa.com/hudson/job/fuxi-rdflib3/lastCompletedBuild/testReport/

and (again, fwiw) test coverage

http://bel-epa.com/hudson/job/fuxi-rdflib3/Test_coverage_Report/

The complete console output is captured here:

http://bel-epa.com/hudson/job/fuxi-rdflib3/17/console

For my own convenience, I adjusted matters so that I could run nose, its 
--pdb and --pdb-failure options are extremely useful conveniences. The
existing test/suite.py seems to find 469 doctests, I can't explain the
difference as yet. 

I can't detect any significant difference between the result of suite.py run
with FuXi+layercake and the same test run with refactoredFuXi + 
rdflib3/restorationrdfextras; the numbers of tests, passes and fails were
pretty much the same (to a casual inspection).

The overwhelming majority of the test failures would seem to be due to simple
case mismatches and other format mismatches, e.g.

Expected:
    ( ex:Fire and ex:Water )
Got:
    ( ex:Fire AND ex:Water )

If this were any domain other than RDF, I would readily opine that a fix
would appear to be trivial - but I've learned to be circumspect, even
with what seems obvious.

I have recently removed the rdflib2/rdflib3 import switching because FuXi does 
not run with either rdflib-2.4.1 or rdflib-2.4.2, only with the "layercake"
fork. This is immediately apparent with the failure of imports of a 
non-existent 
"parse" function in rdflib.sparql.parser and then an rdflib.OWL module which is
missing completely from the 2.4.1/2.4.2 package (that's as far as I got before
the realisation settled in). 

HTH,

Cheers,

Graham Higgins

Original comment by gjhigg...@gmail.com on 9 Oct 2011 at 12:24

GoogleCodeExporter commented 9 years ago

This is something I'm committed to doing.  The next milestone will be about OWL 
2 EL reasoning and proof generation and the milestone after that will be 
completely focused on switching FuXi over to rdflib3

Original comment by chime...@gmail.com on 3 Nov 2012 at 9:44

Changed state: Started
Added labels: Priority-Low, Type-Defect
Removed labels: Priority-Medium, Type-Enhancement

cliffxuan / fuxi

Switch over to current rdflib #4