UChicago-Computational-Content-Analysis / Readings-Responses-2023

1 stars 0 forks source link

6. Prediction & Causal Inference - [E2] 2. Pham, Thai T. and Yuanyuan Shen. 2017. #25

Open JunsolKim opened 2 years ago

JunsolKim commented 2 years ago

Post questions here for this week's exemplary readings: 2. Pham, Thai T. and Yuanyuan Shen. 2017. “A Deep Causal Inference Approach to Measuring the Effects of Forming Group Loans in Online Non-profit Microfinance Platform”. arXiv.org preprint: 1706.02795.

chuqingzhao commented 2 years ago

It is an inspiring paper that uses bunch of advanced deep learning techniques to deal with unstructured data in a philanthropic online market. The paper explains the rationale for selecting models in detail; however, it is unclear to me how do they link the description data to their conclusion of "forming group loans has a significant treatment effect on funding time"? I also wonder is there any linguistic patterns or social games can be analyzed further based on description data (such as borrower's gender, age, detailed plan for loan vs abstract plan, etc)?

mikepackard415 commented 2 years ago

In this paper the authors chose to use the Wikipedia pre-trained GloVe vectors, rather than to create the embeddings from the corpus. What are the pros and cons of this decision, and how should we decide when to use pre-trained vectors and when to train the vectors on the data in question?

chentian418 commented 2 years ago

In terms of the causal inference setting, the author argue unconfoundedness hold by stating that: If this endogeneity is large, then everyone would think forming group loans will speed up the funding process and everyone will tend to do so (following the practice in traditional microfinance); we know that this is not the case here. Hence, we can assume that unconfoundedness holds. I am still confused about this statement. For one thing, lack of evidence that everyone forms group loans is not sufficient for small endogeneity. For another, it's very hard to control for all x that may correlates with the potential outcomes and treatment effect. Therefore, maybe using propensity score match would be a better idea?

Thanks!

YileC928 commented 2 years ago

While reading the paper, I wondered why the authors did not extract additional information from the description (e.g., text length, sentiment, and the borrowers' personal background), which could possibly serve as covariates. Also, I would appreciate clarification on the feature selection and causal inference setup of the paper, it seems a bit arbitrary to me.

Sirius2713 commented 2 years ago

In this paper the authors chose to use the Wikipedia pre-trained GloVe vectors, rather than to create the embeddings from the corpus. What are the pros and cons of this decision, and how should we decide when to use pre-trained vectors and when to train the vectors on the data in question?

I think the pro of using a pre-trained model are:

  1. Quick. We don't have to train a model by ourselves.
  2. More accurate. The pretrained models are usually based on a very large corpus. Therefore, they can capture more accurate semantic information.

The cons of a pre-trained model are:

  1. The corpus that the pre-trained model was trained on may not match our target corpus.
  2. We're only interested in semantics under a specific subset rather than the whole picture.