HughP / simal

Automatically exported from code.google.com/p/simal
0 stars 0 forks source link

Categories with one project return null #413

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
For each of the 4 categories with only one project, out of 15 or so, I'm 
getting an exception ("Cannot populate page with null category.") when viewing 
the category detail page. Stepping through with the debugger reveals that, 
indeed, no results are return by the Sparql query under 
JenaCategoryService.findById(id).

The generated query looks fine: 
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX simal: 
<http://oss-watch.ac.uk/ns/0.2/simal#>SELECT DISTINCT ?category WHERE { 
?category simal:categoryId "per1451"}

I tried checking the query with the query tool on the Tools page, but no joy: 
even an unrestricted query ( PREFIX rdf: 
<http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX simal: 
<http://oss-watch.ac.uk/ns/0.2/simal#>SELECT DISTINCT ?category WHERE {}) 
always returns nothing.

Original issue reported on code.google.com by Stevage on 16 Mar 2011 at 9:31

GoogleCodeExporter commented 9 years ago
Can create a test that will reproduce this for us? We can then seek to fix it.

See 
http://code.google.com/p/simal/source/browse/trunk/uk.ac.osswatch.simal.core/src
/test/java/uk/ac/osswatch/simal/integrationTest/service/TestCategoryService.java

If you are not sure how to create a test for this we'd be happy to help on the 
list.

Original comment by ross.gardler on 16 Mar 2011 at 2:42

GoogleCodeExporter commented 9 years ago
Thanks - the prod to reproduce has demonstrated that the problem was in my 
database. So something in the importing of lots of RDF/XML has (once again) 
caused the problem to occur. Importing with a slightly cleaner subset of those 
files worked ok. So I'm guessing it was something to do with a file that 
triggered an exception, but still got saved (but failed to update the 
categories list properly).

I can send you a zip of the corrupted database, but given the above, it's 
probably not interesting. (I don't want to post it here as it contains 
semi-confidential data).

Original comment by Stevage on 17 Mar 2011 at 6:12

GoogleCodeExporter commented 9 years ago
I feel your problems with invalid RDF are caused by two factors:

a) your new to the project and thus uncovering issues by adopting a differnt 
path to us

and, more importantly because it's avoidable

b) you DOAP customisations are allowing bad data (from the Simal perspective) 
to be generated

It's hard for us to prioritise genuine issues from cause a) when most of the 
problems appear to be from cause b).

I don't thinl there is much need to share your DB with us for debugging. We 
currently have > 1600 projects and > 1200 people without any of these kinds of 
problems.

If you can narrow down the problems that fit into category a) we'd be glad to 
either fix them or help you fix them (and by saying that I mean "thank you for 
helping us uncover these issues, you've identified a few already - please lets 
have more. No problem with the odd erroneous report like this one.").

Original comment by ross.gardler on 17 Mar 2011 at 11:54

GoogleCodeExporter commented 9 years ago
> and, more importantly because it's avoidable
>
> b) you DOAP customisations are allowing bad data (from the Simal
> perspective) to be generated

It's not so avoidable because the data quality rules are not well documented. 
Various bits of the code make assumptions about what is in a project record. If 
the assumption is incorrect, an exception is raised, and the user sees an 
error. But those assumptions (from memory, a project must have a description, 
for instance) aren't documented, nor is there any validation at the time 
records are ingested. I guess this is an aspect of semantic web that I will 
have to get used to: input data is simply saved directly to the database with 
no checking. In a traditional SQL database, you would have constraints, foreign 
keys and the like, meaning you can guarantee that at the time you retrieve data 
that it will be in a consistent, good state.

Bottom line: it's actually surprisingly hard to avoid creating "bad data". I 
wouldn't describe what I'm doing as "DOAP customisations" - most of what I'm 
doing is simply generating data to import. Pretty standard use case, really. 

> It's hard for us to prioritise genuine issues from cause a) when most of the
> problems appear to be from cause b).
>
> I don't thinl there is much need to share your DB with us for debugging. We
> currently have > 1600 projects and > 1200 people without any of these kinds
> of problems.

Indeed. Well, I just hit this problem again, and fortunately I was able to 
resolve it. In this case, I was running identical code *and data* on 
development machine (XP) and production (Linux). The bug only showed up on the 
production machine. I made a slight tweak to one of the 30 or so description 
files I'm importing, cleaned the database, reimported...problem went away. Who 
knows.

Original comment by Stevage on 4 Apr 2011 at 1:26

GoogleCodeExporter commented 9 years ago
I think I have discovered the following amusing workaround:

1) Start from an empty/non-existent database.
2) Import all the projects.
3) Import all the projects.
4) Start Simal

Original comment by Stevage on 4 Apr 2011 at 4:57

GoogleCodeExporter commented 9 years ago
By "avoidable" I meant you can work around it. I certainly didn't mean it 
shouldn't be better documented and/or handled in the code - this is alpha code 
remember.

The DOAP creator form in SVN creates records that will display correctly and 
our live instance of Simal has >1500 projects with no such problem.

This is clearly a bug, it's marked as invalid because we can't hope to 
reproduce it. We have never seen the problem you are describing and can't 
reproduce it. If you are able to provide us with some sample data that will 
reproduce this then we might be able to fix it. Without that there is no hope 
of us finding it until we hit it ourselves.

As for your workaround in comment 5 I agree it is "amusing". I can think of no 
sensible reason why that would resolve the issue. But again without having the 
data you are importing we're kind of stuck.

Sorry we can't be more helpful on this one.

Original comment by ross.gardler on 4 Apr 2011 at 8:49

GoogleCodeExporter commented 9 years ago
How do you import the projects? Is it from DOAP RDF/XML files? 

Also I'm curious about the non-existing categories, and I noticed there's an 
error in the query you post above. The query: 

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX simal: <http://oss-watch.ac.uk/ns/0.2/simal#>
SELECT DISTINCT ?category WHERE {}

actually doesn't return anything because you're not specifying what you want 
returned. What you could do if you want to retrieve is say "give me everything 
with a categoryId" which would be something like : 

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX simal: <http://oss-watch.ac.uk/ns/0.2/simal#>
SELECT DISTINCT ?category WHERE { ?category simal:categoryId ?catId}

There are known problems with categories in general that might be related to 
this, eg. categories are sometimes created as type doap:category, which is 
syntactically incorrect (Issue 283). 

If you're hitting problems, please post one typical RDF/XML file so I can see 
if that's what I'd expect Simal to handle correctly. Also, if you can check the 
type of the categories in your data you can use this query: 

PREFIX  rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
PREFIX  simal: <http://oss-watch.ac.uk/ns/0.2/simal#>
SELECT DISTINCT ?category ?type WHERE { 
  ?category simal:categoryId ?catId .
  ?category rdf:type ?type
}

If you post the result here we could identify if it's related to a known issue.

Original comment by sander.v...@oucs.ox.ac.uk on 4 Apr 2011 at 10:07