HughP / olac

Automatically exported from code.google.com/p/olac
0 stars 0 forks source link

Improve ACL repository #11

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Obtain ACL bibtex metadata by crawling anthology site for bib files [1].  
Extract OLAC records.  
Enrich with language identifiers using Google's custom search engine [2] and 
searching for 
substrings of the form [A-Z]\d\d-\d\d\d\d.pdf in the result.  Map language 
names to ISO codes 
[3].  Start with the most populous languages (cf [4]; perhaps there's a better 
list available 
somewhere).

[1] http://www.aclweb.org/anthology-new/
[2] http://www.google.com/cse?cx=011664571474657673452%3A4w9swzkcxiy&q=french
[3] http://www.sil.org/iso639-3/download.asp
[4] http://paginaspersonales.deusto.es/abaitua/konzeptu/nlp/top100.htm

Original issue reported on code.google.com by StevenBird1 on 6 Jan 2009 at 4:29

GoogleCodeExporter commented 9 years ago

Original comment by StevenBird1 on 15 Mar 2009 at 3:56