This is a Python software which lets user to convert matrikel old finnish matrikel books to a csv- and json-format. Supported bookseries at the moment are "Suomen Rintamamiehet 1939-43", "Suomen Pienviljelijät", "Siirtokarjalaisten tie" and "Suuret maatilat". The book series were originally published in 1970s and they contain brief descriptions of the peope, their life, children, spouses etc. This data is scientifically interesting but difficult to analyze statistically in a written format.
Check Pikakäyttöohje and developer documentation from Wiki.
Kaira is meant to be used as a tool to extract interesting data from old matrikels books which have been scanned and OCR'd. Extracted data can then be edited and exported into csv- or json-formats for statistical analysis. The tool was originally developed in Lammi Biological Station in collaboration with John Loehr.
Kaira includes a simple GUI for user to read, export and edit the OCR files and related content. Check detailed usage instructions from wiki.
Check project Wiki to see documentation about how to extend the software with new bookseries and more detailed information about how to set up dev-environment, what you need to know etc.
On my part the development will likely stop in beginning of June 2015. Some critical bug fixes might be done afterwards.
Please cite if you use this software or datasets generated by it in your research:
T. Salmi, J. Loehr. Kaira [computer software]. Lammi Biological Station 2015 Available at https://github.com/Tumetsu/Kaira