Obtain ACL bibtex metadata by crawling anthology site for bib files [1].
Extract OLAC records.
Enrich with language identifiers using Google's custom search engine [2] and
searching for
substrings of the form [A-Z]\d\d-\d\d\d\d.pdf in the result. Map language
names to ISO codes
[3]. Start with the most populous languages (cf [4]; perhaps there's a better
list available
somewhere).
[1] http://www.aclweb.org/anthology-new/
[2] http://www.google.com/cse?cx=011664571474657673452%3A4w9swzkcxiy&q=french
[3] http://www.sil.org/iso639-3/download.asp
[4] http://paginaspersonales.deusto.es/abaitua/konzeptu/nlp/top100.htm
Original issue reported on code.google.com by StevenBird1 on 6 Jan 2009 at 4:29
Original issue reported on code.google.com by
StevenBird1
on 6 Jan 2009 at 4:29