Strategic Sampling & Active Learning -Orientation

lkcao commented 2 years ago

Post your questions about: “When Big Data is Too Small - Sampling, Crowd-Sourcing and Bias (Links to an external site.)” & “Dynamic Data - Active, Adaptive and Continual Learning”, Thinking with Deep Learning, Chapters 7 & 8

thaophuongtran commented 2 years ago

Questions/comments for Chapter 7 - When Big Data is Too Small - Sampling, Crowd-Sourcing and Bias: As I read the about negative sampling, I find the concept very interesting. According to the chapter, lack of negative samples takes place due to natural selection process bias and there is no records of unsuccessful and nonsensical artifacts. The chapter went on to describe the solutions for negative sampling, however, I'm still unclear about the motivation and the reason behind this need for negative samples?

pranathiiyer commented 2 years ago

When determining confidence intervals using bootstrap sampling, is there a general standard that is more accepted than others? For instance, 95% CI is usually the norm for several statistical problems. Given that bootstrap sampling can be computationally intensive, how is a judgement usually made between the trade off of a higher CI and creating lesser samples?

pranathiiyer commented 2 years ago

Questions/comments for Chapter 7 - When Big Data is Too Small - Sampling, Crowd-Sourcing and Bias: As I read the about negative sampling, I find the concept very interesting. According to the chapter, lack of negative samples takes place due to natural selection process bias and there is no records of unsuccessful and nonsensical artifacts. The chapter went on to describe the solutions for negative sampling, however, I'm still unclear about the motivation and the reason behind this need for negative samples?

Hey! As I understand, negative sampling is usually used to add noise to your data, make it more generalizable, and reduce imbalance. For instance in word2vec which uses skip gram, since you're trying to use the context words to predict the skipped word, the objective function is only trying to maximise the similarity between the skipped and surrounding words. But with negative sampling, the objective function also uses non-surrounding words at the same time to minimize similarity between two words that don't have the same surrounding contexts. But it can be a bit confusing, I agree!

yujing-syj commented 2 years ago

Questions for Chapter 8: Dynamic Data - Active, Adaptive, and Continual Learning. Since the active learning is using the query stretagies to get the labels, can I reuse an actively labeled dataset to train a new different model? Also, sampling is biased in this case. My understanding is the actively labeled dataset doesn't reflect the true training data distribution. What will this cause? Is there any problem with this?

sabinahartnett commented 2 years ago

Question for Chapter 7: Many of us are likely to use convenience sampling in our studies, simply because (as described in the chapter) it’s all we have access to. As mentioned in the chapter, sub-, over- and under-sampling can benefit the performance of the model. Are there any rules of thumb in this tradeoff (i.e. balancing a dataset of a certain size is worth losing a certain percentage of that data, or a performance metric we should look to to compare the convenience and augmented datasets?)

javad-e commented 2 years ago

Question about chapter 7: the text mentions transforming data as a way of data augmentation. The provided examples include “adding rotated, reflected, and rescaled versions of sample images”. Could you explain why this is helpful? We are not changing the relationship between the pixels by rotating, reflecting, or scaling images. So, how is the machine learning additional information from these?

JadeBenson commented 2 years ago

Now that we've finished this section on training and tuning deep learning models - I was wondering if you could walk through / explain how we can diagnose the underlying problems behind poor model performance when considering all of these pieces together? Like if we have a model that is performing poorly - how would we know that we need an entirely new algorithm versus just needing more extensive hyperparameter tuning vs. a sampling strategy vs. adding different types of data? If our model is only predicting one class that may be a good sign to change the sampling strategy, but what about for the other adjustments? Thanks!

ValAlvernUChic commented 2 years ago

I'm curious about Importance Sampling described in Chapter 8. The section essentially implies that the rewards of such a method might not be worth the amount of time it'd take to establish the sampling distribution but I was wondering what applications might be worth this/under what conditions we could consider using this. The idea of prioritizing adversarial data for training seems like an intuitive step to continue model improvement and I imagine that some trade offs might be worth the time spent.

egemenpamukcu commented 2 years ago

In general most of the challenges I faced while training models had something to do with an imbalance in the training data. I have read that oversampling, especially in the case of tabular data, often does not work as expected and does not improve model performance compared to alternative approaches like adjusting the loss function to penalize mistakes underrepresented more than the overrepresented class--and that has been my experience too. Whereas, it usually helps in some computer vision tasks as distorted versions of existing images can often resemble what our model can expect to see after the training process. I was wondering if there are such rules of thumb on how to handle imbalance based on data modality, and the task at hand.

ShiyangLai commented 2 years ago

Questions for Chapter 8. This chapter introduces various query strategies for active learning which is very informative. But my question is, in practice, how should we select the right strategy among countless options on a case-by-case basis? Is there any uniformly agreed principles for query strategy selection? What is the pros and cons of each method?

zihe-yan commented 2 years ago

Question for Chapter 7. The concept of active learning, which involves human experts or the use of the original data as labels, seems to differ from neural networks such as the Autoencoder. For example, if we want to code text data, when should we adopt active learning, and when should we use autoencoder?

BaotongZh commented 2 years ago

Chapter 7 gives us a comprehensive introduction about the data sampling strategies, I have some questions regarding the "Data Augmentation" and "Combined Data with Mixup". I was just wondering that How such two strategies would affect the bias and variants trade-off of our model. And also, I learned about that the Spatial Transformer Layer could adjust the inputs to proper forms that our model can successfully work(like automatically rotate an image), So how could we combine the data sampling strategy and spatial transformer layer.

borlasekn commented 2 years ago

Similar to many of the questions above, I was wondering about the ideas of oversampling/bootstrapping. I studied mathematical statistics in my undergrad career, and I (vaguely) remember entire sections on classes about bootstrapping/oversampling, but these discussions always came riddled with warnings (as classmates have mentioned above). Is there a threshold for a time when it is imperative that you over or undersample, or, as I learned in my statistical theory classes, are these decisions really just subjective to the researchers? If they are, as a researcher, what is the best opinion/advice you (or anyone) could give into these methods?

mdvadillo commented 2 years ago

My question relates to Chapter 7. Although I understand the need for data augmentation, and I can see how in predictive tasks perturbing the data will make the algorithms more accurate, I am having trouble understanding how it would not make inferences less accurate.

isaduan commented 2 years ago

The chapter opens with a note of "ethical responsibility" regarding sampling so I wonder what are some ethical considerations that we should bear in mind when choosing sampling strategies, aside from optimizing our model for understanding and making inferences about the world. Are there "best practices" in the field? Or do people disagree about how to go about it?

yhchou0904 commented 2 years ago

Chapter7 introduces lots of sampling strategies, and also the situations that suit these sampling methods. I noticed that the difference between random sampling and Sobol sampling is based on whether there might exist the threat of missing some part of data with random sampling; therefore, it seems that we could not be certain about which method to use if we don't know things about the population, right? Except for that, I am a bit confused about the part about stratified random. It is said that the method divides data into subgroups, and I am wondering whether we could realize this method from an intuition of fixed effects? Also, what's the difference between stratified random sampling and systematic sampling? They have quite alike introductions for me.

y8script commented 2 years ago

For issues about data augmentation methods in Chapter 7, the reason why data augmentation can improve prediction may be that there are commonalities in the non-noisy training data which can be captured by the neural networks. Although I agree that such commonalities among input data exist, I still wonder why some of these commonalities can be informative for classification or other tasks (why they differ between categories and can be picked by the neural networks rather than being averaged out)?

min-tae1 commented 2 years ago

One of the reasons for sampling, according to chapter 7, is to label rare items. But I was curious rather if sampling methods mentioned on the subsequent chapter could provide a way of analyzing such rarities, or do we need human intervention in this case.

Emily-fyeh commented 2 years ago

I am also curious about the sampling strategies of active learning in Chap 8. I wonder if there are rationales for choosing sampling methods derived from the intrinsic properties of the dataset?

hsinkengling commented 2 years ago

One of the rationales for strategic sampling is when we have limited data. I wonder, after implementing these methods, how much information do we still lose when compared to when we have sufficient data? What are the limits to using sampling methods and active learning when compared with large data?

Yaweili19 commented 2 years ago

This Chapter is by far my favorite. I've never thought of seeing big data as sample, nor have I consider them too small. I tend to limit my analysis or research on the group that I have, but this chapter changed my way of thinking about external validity when using big data. However, how to properly estimate the bias when generalizing remains a challenge in my opinion.

Hongkai040 commented 2 years ago

I have a specific question about oversampling. How can oversampling "amplifies the influence of idiosyncratic cases" by simply duplicating minority examples, given those duplicated samples have identical features? And will this method amplify noise if the minority classes contain noise?

linhui1020 commented 2 years ago

I am interested in the stratified sampling. If we stratified more than one attribute and intend to be as rigorous as controlling every dimension, would our model's prediction decreases because our stratified method ignore some relationship which could be observed with certain distribution of attribute?

Thinking-with-Deep-Learning-Spring-2022 / Readings-Responses

Strategic Sampling & Active Learning -Orientation #4