Closed GoogleCodeExporter closed 9 years ago
Sorry, I don't know how it was done for the Rosetta Project. Are we supposed to
develop and maintain the transformation service that turns the site into an
OLAC data provider or static repository? Or, will the data provider/static
repository be given to us?
Original comment by haepal
on 2 Sep 2010 at 1:24
In issue 193 I've explained how we did the Rosetta Project repository. With
the result from issue 193 in hand, we'll be able to capture a Language Commons
repository. However, note that Language Commons submitters must do more when
entering metadata. There are currently two records:
http://www.archive.org/services/oai2.php?verb=ListRecords&metadataPrefix=oai_dc&
set=collection:LanguageCommons
One has <dc:language>en</dc:language> and the other has no language, so we
can't put it on a language index page, which makes the resource invisible to
those who might care about it.
What the Rosetta Project collection has done is to use both <dc:language> and
<dc:subject> with three-letter ISO language codes, which gives us rich metadata
for OLAC purposes. We will have to find a way to get Language Commons
submitters to supply the appropriate language codes for both <dc:language> and
<dc:subject>.
Original comment by garyfsim...@gmail.com
on 2 Sep 2010 at 10:35
The process of approving submissions can ensure we have language codes in the
dc:language and dc:subject element. Where there are multiple languages, they
will be comma separated.
Original comment by StevenBird1
on 6 Sep 2010 at 4:34
Based on what have been done for the Rosetta Project repository, a harvester
and an XSL stylesheet have been written.
Using those, a Language Commons static repository has been created and
registered.
Original comment by haepal
on 25 Oct 2010 at 2:43
I've just accepted the registration since it all looks valid. However, the
metadata could be improved. The biggest thing missing is that both of the
resources in the collection really want to have:
<dc:type xsi:type="olac:linguistic-type" olac:code="primary_text"/>
so that they will emerge from the thousands of "other resources" for English
and be findable as text corpora in the faceted search.
Is there a clue in the OAI metadata that is coming out of the Internet Archive,
or does something need to be added to the guidelines that the Language Commons
gives to data providers?
Original comment by garyfsim...@gmail.com
on 25 Oct 2010 at 4:44
[deleted comment]
If the language or subject consists of exactly 2 or 3 letters, specify
olac-language scheme. Permit comma-separated values for multiple languages.
Original comment by StevenBird1
on 27 Oct 2010 at 9:25
Fixed (see revision 1528).
Original comment by haepal
on 17 Nov 2010 at 4:54
This point from comment 7 does not appear to be implemented yet: "If the
language or subject consists of exactly 2 or 3 letters, specify olac-language
scheme."
Original comment by garyfsim...@gmail.com
on 28 Nov 2010 at 2:48
The static repository xml file itself hadn't been updated. Just updated the
file which will be re-harvested soon.
Original comment by haepal
on 6 Dec 2010 at 3:39
Repository was purged in the database due to the wrong BaseURL. The stylesheet
has been fixed to correct this and the repository xml file has been fixed.
Original comment by haepal
on 7 Dec 2010 at 5:38
Original comment by haepal
on 8 Dec 2010 at 2:58
The baseurl for the Language Commons has been submitted and approved some days
ago, but the new records are not showing up in OLAC search. The new URL, shown
in archive_review.php, is http://upload.languagecommons.org/sr . However, the
registered URL, shown in
http://www.language-archives.org/archive/languagecommons.org, still seems to be
http://www.language-archives.org/hosted/languagecommons.org.xml .
Original comment by StevenBird1
on 22 Feb 2011 at 12:05
Found that the harvester process was being blocked since Feb 6. Killed the
process, and will check tomorrow whether it has run and whether the new URL has
been harvested.
Original comment by haepal
on 22 Feb 2011 at 9:32
The harvester cron job harvested the new URL successfully.
Original comment by haepal
on 23 Feb 2011 at 2:43
Original comment by StevenBird1
on 28 Feb 2011 at 3:04
Original issue reported on code.google.com by
StevenBird1
on 2 Sep 2010 at 12:16