Computational-Content-Analysis-2018 / 5-Jan-Machine-Translation-Mining-Text-for-Social-Theory

Evans, James and Pedro Aceves. 2016. “Machine Translation: Mining Text for Social Theory”. Annual Review of Sociology 42:21-50. DOI: 10.1146/annurev-soc-081715-074206
https://github.com/Computational-Content-Analysis-2018
0 stars 0 forks source link

Non-English Text and the Problem of test in Unsupervised Learning #4

Open khan1792 opened 6 years ago

khan1792 commented 6 years ago

The paper is very interesting. I have two questions about the contents mentioned in the paper.

First, are there some systematic differences in techniques when we analyze alphabetic languages such as English and ideographic languages such as Chinese since they have very different structures?

Second, unsupervised learning usually has a problem of test. You mention in the paper that we can discover a theory by using it and then test it by supervised methods. However, some datasets we use don't have response variables so that we cannot conduct a test; sometimes we may test the discovery in other similar datasets that contain response variables, but there usually are differences between training dataset and test datasets that we probably even don't know and thus it decreases the the test efficiency. Therefore, does the test problem mean that we still need some theoretical or empirical foundations to interpret some exciting findings from unsupervised learning and thus the latter are usually still limited by our previous knowledge.