Closed GoogleCodeExporter closed 9 years ago
Can create a test that will reproduce this for us? We can then seek to fix it.
See
http://code.google.com/p/simal/source/browse/trunk/uk.ac.osswatch.simal.core/src
/test/java/uk/ac/osswatch/simal/integrationTest/service/TestCategoryService.java
If you are not sure how to create a test for this we'd be happy to help on the
list.
Original comment by ross.gardler
on 16 Mar 2011 at 2:42
Thanks - the prod to reproduce has demonstrated that the problem was in my
database. So something in the importing of lots of RDF/XML has (once again)
caused the problem to occur. Importing with a slightly cleaner subset of those
files worked ok. So I'm guessing it was something to do with a file that
triggered an exception, but still got saved (but failed to update the
categories list properly).
I can send you a zip of the corrupted database, but given the above, it's
probably not interesting. (I don't want to post it here as it contains
semi-confidential data).
Original comment by Stevage
on 17 Mar 2011 at 6:12
I feel your problems with invalid RDF are caused by two factors:
a) your new to the project and thus uncovering issues by adopting a differnt
path to us
and, more importantly because it's avoidable
b) you DOAP customisations are allowing bad data (from the Simal perspective)
to be generated
It's hard for us to prioritise genuine issues from cause a) when most of the
problems appear to be from cause b).
I don't thinl there is much need to share your DB with us for debugging. We
currently have > 1600 projects and > 1200 people without any of these kinds of
problems.
If you can narrow down the problems that fit into category a) we'd be glad to
either fix them or help you fix them (and by saying that I mean "thank you for
helping us uncover these issues, you've identified a few already - please lets
have more. No problem with the odd erroneous report like this one.").
Original comment by ross.gardler
on 17 Mar 2011 at 11:54
> and, more importantly because it's avoidable
>
> b) you DOAP customisations are allowing bad data (from the Simal
> perspective) to be generated
It's not so avoidable because the data quality rules are not well documented.
Various bits of the code make assumptions about what is in a project record. If
the assumption is incorrect, an exception is raised, and the user sees an
error. But those assumptions (from memory, a project must have a description,
for instance) aren't documented, nor is there any validation at the time
records are ingested. I guess this is an aspect of semantic web that I will
have to get used to: input data is simply saved directly to the database with
no checking. In a traditional SQL database, you would have constraints, foreign
keys and the like, meaning you can guarantee that at the time you retrieve data
that it will be in a consistent, good state.
Bottom line: it's actually surprisingly hard to avoid creating "bad data". I
wouldn't describe what I'm doing as "DOAP customisations" - most of what I'm
doing is simply generating data to import. Pretty standard use case, really.
> It's hard for us to prioritise genuine issues from cause a) when most of the
> problems appear to be from cause b).
>
> I don't thinl there is much need to share your DB with us for debugging. We
> currently have > 1600 projects and > 1200 people without any of these kinds
> of problems.
Indeed. Well, I just hit this problem again, and fortunately I was able to
resolve it. In this case, I was running identical code *and data* on
development machine (XP) and production (Linux). The bug only showed up on the
production machine. I made a slight tweak to one of the 30 or so description
files I'm importing, cleaned the database, reimported...problem went away. Who
knows.
Original comment by Stevage
on 4 Apr 2011 at 1:26
I think I have discovered the following amusing workaround:
1) Start from an empty/non-existent database.
2) Import all the projects.
3) Import all the projects.
4) Start Simal
Original comment by Stevage
on 4 Apr 2011 at 4:57
By "avoidable" I meant you can work around it. I certainly didn't mean it
shouldn't be better documented and/or handled in the code - this is alpha code
remember.
The DOAP creator form in SVN creates records that will display correctly and
our live instance of Simal has >1500 projects with no such problem.
This is clearly a bug, it's marked as invalid because we can't hope to
reproduce it. We have never seen the problem you are describing and can't
reproduce it. If you are able to provide us with some sample data that will
reproduce this then we might be able to fix it. Without that there is no hope
of us finding it until we hit it ourselves.
As for your workaround in comment 5 I agree it is "amusing". I can think of no
sensible reason why that would resolve the issue. But again without having the
data you are importing we're kind of stuck.
Sorry we can't be more helpful on this one.
Original comment by ross.gardler
on 4 Apr 2011 at 8:49
How do you import the projects? Is it from DOAP RDF/XML files?
Also I'm curious about the non-existing categories, and I noticed there's an
error in the query you post above. The query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX simal: <http://oss-watch.ac.uk/ns/0.2/simal#>
SELECT DISTINCT ?category WHERE {}
actually doesn't return anything because you're not specifying what you want
returned. What you could do if you want to retrieve is say "give me everything
with a categoryId" which would be something like :
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX simal: <http://oss-watch.ac.uk/ns/0.2/simal#>
SELECT DISTINCT ?category WHERE { ?category simal:categoryId ?catId}
There are known problems with categories in general that might be related to
this, eg. categories are sometimes created as type doap:category, which is
syntactically incorrect (Issue 283).
If you're hitting problems, please post one typical RDF/XML file so I can see
if that's what I'd expect Simal to handle correctly. Also, if you can check the
type of the categories in your data you can use this query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX simal: <http://oss-watch.ac.uk/ns/0.2/simal#>
SELECT DISTINCT ?category ?type WHERE {
?category simal:categoryId ?catId .
?category rdf:type ?type
}
If you post the result here we could identify if it's related to a known issue.
Original comment by sander.v...@oucs.ox.ac.uk
on 4 Apr 2011 at 10:07
Original issue reported on code.google.com by
Stevage
on 16 Mar 2011 at 9:31