HughP / simal

Automatically exported from code.google.com/p/simal
0 stars 0 forks source link

Import DOAP from Sourceforge #322

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Source forge has an (unmaintained) API for project data, including in DOAP 
format.

We should check this out and, if it is sufficiently accurate, provide a means 
for importing data from Sourceforge.

http://sourceforge.net/apps/trac/sourceforge/wiki/API

Original issue reported on code.google.com by rgardler...@gmail.com on 16 Jul 2010 at 9:31

GoogleCodeExporter commented 9 years ago
I just imported Jena using http://sourceforge.net/api/project/name/jena/doap

Seems to work fine

Original comment by rgardler...@gmail.com on 16 Jul 2010 at 9:34

GoogleCodeExporter commented 9 years ago
Upping the priority as this increases the speed of DOAP creation for projects 
in Sourceforge significantly.

Original comment by ross.gardler on 5 Aug 2010 at 3:19

GoogleCodeExporter commented 9 years ago
Problem with the SF Jena project referenced above is that there's a relative 
URL that breaks the REST interface of the RDF/XML representation. The ugly part 
is: 

<sf:feature>
  <sf:Feature>
    <name>MediaWiki</name>
    <foaf:page rdf:resource="/apps/mediawiki/jena/" />
  </sf:Feature>
</sf:feature>

Error in REST /project call is: 

Only well-formed absolute URIrefs can be included in RDF/XML output: 
</apps/mediawiki/jena/> Code: 57/REQUIRED_COMPONENT_MISSING in SCHEME: A 
component that is required by the scheme is missing.

Need to check if this problem occurs consistently in the SF projects..

Original comment by sander.v...@oucs.ox.ac.uk on 10 Aug 2010 at 2:00

GoogleCodeExporter commented 9 years ago
I also note that their representation of issue trackers and mailing lists, and 
possibly other items is not conforming to the DOAP schema.

SF say that this feature is not supported and since it has problems we should 
probably avoid working around the problems.

I therefore suggest that we implement this feature to only pull in the content 
that is correctly marked up as DOAP data. From the Jena example above:

<name>jena</name>

<created>2001-11-20</created>

<description xml:lang="en">
Jena is Java toolkit for developing semantic web applications based on W3C 
recommendations for RDF and OWL. It provides an RDF API; ARP, an RDF parser; 
SPARQL, the W3C RDF query language; an OWL API; and rule-based inference for 
RDFS and OWL.
</description>

<download-page 
rdf:resource="http://sourceforge.net/project/showfiles.php?group_id=40417"/>

<homepage rdf:resource="http://openjena.org"/>

<screenshots 
rdf:resource="http://sourceforge.net/project/screenshots.php?group_id=40417"/>

<developer>
...
</developer>

<maintainer>
...
</maintainer>

All other content should be filtered out on import. I'll create a separate 
issue to write code to grab the mailing list and bug-database information and a 
third issue for grabbing additional information from the sf:feature property.

Original comment by ross.gardler on 10 Aug 2010 at 2:23

GoogleCodeExporter commented 9 years ago
I agree with pulling in what we want for now.

However, I'd like to be pedantic here:
The way SourceForge represents mailing-lists and bug-databases is compatible 
with DOAP. For example for Jena:

    <mailing-list>
        <sf:MailingList>
            <name>jena-devel</name>
            <rss:channel rdf:resource="http://sourceforge.net/api/message/index/list-name/jena-devel/rss" />
            <foaf:page rdf:resource="http://sourceforge.net/mailarchive/forum.php?forum_name=jena-devel" />
            <sf:total-count>3</sf:total-count>
        </sf:MailingList>
    </mailing-list>

and: 

    <bug-database>
        <sf:Tracker>
            <name>Bugs</name>
            <rss:channel rdf:resource="http://sourceforge.net/api/artifact/index/tracker-id/430288/rss" />
            <foaf:page rdf:resource="http://sourceforge.net/tracker/?group_id=40417&atid=430288" />
            <sf:total-count>301</sf:total-count>
            <sf:open-count>16</sf:open-count>
        </sf:Tracker>
    </bug-database>

are in accordance with the DOAP ontology description because DOAP does not 
specify the range of these elements so it leaves room to create custom classes 
for it. 

So it's valid DOAP and valid RDF (although problematic because of the URL issue 
mentioned in comment #3)

Original comment by sander.v...@oucs.ox.ac.uk on 10 Aug 2010 at 2:32