The data in our roots table was extracted and normalized for storage in our database tables by a series of python scripts. It was complicated, so there are some errors. These include things like unmatched parentheses, extra periods, and other obvious mistakes - mostly in the 'grammar' column. It also includes initial upper-case letters in the Nicodemus column entries, these should be lower-cased.
[ ] review the roots table data to get a sense of the types of obvious parsing errors there are.
[ ] download the roots table data from hasura, and
[ ] develop and run one or more scripts to identify unmatched parens, extra periods, and any other obvious errors you find in reviewing the grammar etc., columns;
[ ] develop and run a script to lower-case word-initial upper case letters in the Nicodemus column
[ ] return a cleaned-up roots data file in csv format.
The data in our roots table was extracted and normalized for storage in our database tables by a series of python scripts. It was complicated, so there are some errors. These include things like unmatched parentheses, extra periods, and other obvious mistakes - mostly in the 'grammar' column. It also includes initial upper-case letters in the Nicodemus column entries, these should be lower-cased.