RichardLitt / lrl

For work concerning low resource languages.
2 stars 1 forks source link

lrl

For script work concerning low resource languages. This does not include visualisations, Semantic Web data, or other random scripts that can be found in my other repositories.

facebook-scraper

The Facebook scripts in here are for non-automatically harvesting data from Facebook groups, using manual AJAX querying and saving the source from the browser. It is not an automatic data collection scheme, nor a scraper, which makes it legal (afaik). A paper based is currently in progress.

maltese-dict

I have developed a GUI and terminal-side dictionary program based on word lists I have access to; one from the internet, and a cleaned-up copy available via METASHARE on a CC BY-NC-SA license. I will presumably keep working on this throughout my time in Malta.

maltese-*

In development. These are for courses at the University of Malta. One is a stemmer, based on the NLTK stemmers (Snowball, ISRI). One is a broken plural noun morphological analyser, based on previous work by Farrugia. The other is a chunker and basic code switching identifier, based on the work done in facebook-scraper and on Fabri's theoretical research on Maltese compounds.