UChicago-CCA-2021 / Readings-Responses

1 stars 0 forks source link

Classifying Meanings & Documents - Fundamentals #17

Open HyunkuKwon opened 3 years ago

HyunkuKwon commented 3 years ago

Post questions here for one or more of our fundamentals readings:

Manning, Christopher, Prabhakar Raghavan and Hinrich Schütze. 2008. “Text Classification and Naïve Bayes”, “Vector Space Classification,” and “Support Vector Machines.” Chapters 13-16 from Introduction to Information Retrieval: 234-320.

or,

Witten, Ian H., Eibe Frank, Mark A. Hall, Christopher J. Pal. 2017. “Ensemble Learning” Chapter 12 from Data Mining: Practical Machine Learning Tools and Techniques, 4th Edition: 351-371.

RobertoBarrosoLuque commented 3 years ago

Given that most vector-space representation of textual data is multi-dimensional and that support vector machines are trying to optimize the geometric margin among different classes to find the relevant decision boundary it seems SVMs are better suited for binary classification. In addition to one-vs-one/one-vs-rest are there any other methodologies that allow us to use SVM for multi-label classification?

jacyanthis commented 3 years ago

Manning et al say, "It is perhaps surprising that so many of the best-known text classification algorithms are linear. Some of these methods, in particular linear SVMs, regularized logistic regression and regularized linear regression, are among the most effective known methods." Is this still true in 2021? If not, is it at least true outside of neural networks?

zshibing1 commented 3 years ago

In “Text Classification and Naïve Bayes” figure 13.8, why does accuracy for multinomial and Bernoulli models increase again with the number of features selected after the accuracy reaching a local maximum?

k-partha commented 3 years ago

1) Most classification tasks for textual data rely on Euclidean vector spaces. What analogous algorithms (to KNN or Rocchio) exist for classification tasks in non-Euclidean spaces? In general, how would one think about classification in a non-Euclidean space?

2) The reading (which is 12 years old at the moment) primarily discusses the Rocchio, Naive Bayes, and KNN algorithms for vector space classification. To my knowledge, Rocchio is not a highly relevant method in this day and age. What methods are currently considered as the industry standards for high dimensional vector space classification?

jcvotava commented 3 years ago

How do the modelling techniques laid out in “Text Classification and Naïve Bayes" compare in accuracy to a classical approach where the researchers simply take a random simple and classify texts by hand? In situations where we have good reasons to think 1) a random sample is representative of the entire population, and 2) the hand coders are good, how accurate would the sampling technique be as compared to the modelling?

jinfei1125 commented 3 years ago

How to choose between different classification models? Like Jacy and this week Klingenstein's exemplary paper mentioned, is linear model still one of the most popular/intuitive/best-perfomance model?

romanticmonkey commented 3 years ago

Could you share some insights or known literature on how to use these methods to classify "hard-to-classify" text, like sarcasm?

MOTOKU666 commented 3 years ago

I totally agree with the author's warning of running programs on test set while developing the methods: "Beginners often violate this rule, and their results lose validity because they have implicitly tuned their system to the test data simply by running many variant systems and keeping the tweaks to the system that worked best on the test set." I'm wondering is there anything else related to this that beginner may get wrong? Would you mind show us an example?

Raychanan commented 3 years ago

We know that PCA is a very important tool for dimensionality reduction. However, the author has also mentioned many methods of dimention reduction in these three chapters. So I would like to know what criteria we should refer to when we adopt the dimention reduction technique? Thanks!

sabinahartnett commented 3 years ago

In many methods we use hand-coded classifications as 'truth' but even in the Orienting reading for this week one of the authors did the coding... what are some common methods to evaluate the hand-coding (especially if there are not multiple coders and cross-validation is thus not possible)? Is this common practice for researchers?

Rui-echo-Pan commented 3 years ago

I also wonder what's the difference among different classification techniques and how should we choose from them. Thanks!

ming-cui commented 3 years ago

My question is for ensemble learning. The chapter introduces bagging, boosting, stacking, etc. In practice, is it suggested to try these techniques out and find the one with the best prediction accuracy?

Bin-ary-Li commented 3 years ago

The reading mentions that the theoretically more powerful models (e.g. SVM) might not perform better than the less powerful ones (e.g. naive Bayes) in a real-world context. In light of this, I wonder how researchers should choose which model to use?

xxicheng commented 3 years ago

I have a similar question as @jinfei1125 , how to choose between different models?

william-wei-zhu commented 3 years ago

If our goal is to maximize accuracy, are ensemble methods always preferred over simpler methods?

theoevans1 commented 3 years ago

Witten et al. explain that ensemble machine learning models can be difficult to interpret intuitively because they combine many individual models. Do you have any examples of this limitation in practice—research with strong predictive performance but little comprehensible explanatory power?

egemenpamukcu commented 3 years ago

Is it common practice for researchers that have a classification task to try out almost all classification methods and continue with the one that had lowest error rate (or whichever metric is prioritized)?

hesongrun commented 3 years ago

This week's reading mainly focus on the pre-deep-learning era algorithms. I think one major characteristic of these algorithms is that they have shallow structure and 'wide' array of features, e.g. in SVM, we define the vector space with kernel functions. I am wondering how would we choose and justify these features in the first place? What are the limitations and advantages of these methods compared to neural nets? Thanks

lilygrier commented 3 years ago

In "Text Classification and Naive Bayes" from Manning et al., the authors assert that the feature selection methods described (mutual information, chi square, and feature-based) are all greedy algorithms that will continue to add features that don't provide additional accuracy above and beyond the existing feature set. Are there iterations of Naive Bayes that attempt to assess accuracy before adding addition features, or is the extra bulk of unnecessary canceled out by the overall efficiency of the single-pass method?

mingtao-gao commented 3 years ago

Generally, I have a question on how to compare ML models and choose the best fitted one. Sometimes, when I ran several different models, but achieved similar performances. In such a case, how can we choose the best model? How should we balance trade-off between accuracy, efficiency, and other attributes?

toecn commented 3 years ago

My question is similar to the ones above, how have these methods evolved other the last 10 years or so?