Portfolio version 1 - Githubissues

Hi Scott! Thanks for this portfolio exercise. I've tried to add a list of things that I think I have been able to think about competitively and precisely.

Merging the customs, trades and public records tables and building a stable TU-SOI-FF variation data build. The TU dataset is one of the most important datasets to study borrowers' interaction with the financial markets. However, each individual table in that raw dataset is very unclean (for example, a ton of different edge cases in the trades table). Another one of a lot of different issues in this raw dataset is that all borrowers showing up in one table might not always show up in either all other tables or in a subset of other tables. I'll take just the example of these couple of issues to showcase the skills I've had to employ to overcome a plethora of challenges. Firstly, solving these issues is a programmatic challenge -- the HDFS cannot hold more than ~200 GB clean data at any given point in time. The compressed trades file of 400k borrowers being stored in HDFS is therefore a dream. As I've explained in a 2-hr long meeting about an automatic-clean data-monitoring system, I've had to develop a considerably complex piece of software to deal with these data issues. I continue to use this system till today with no bugs. Secondly, as you know, there are a considerable amount of edge cases in all tables. I caught a lot of these -- for example, making sure that borrowers with "stale" repayment history are not systematically marked as "transitioned to collections". I remember you said that a model explaining this anomaly was published in AEJ:Macro! Another example could be detecting "vanishing and reappearing collections" like telecom collections/utility collections, etc. Although these seem to be very small catches/trivial minutiae, my objectives at the time included building a stable, correct database in which the studied outcome wasn't accidentally selected on an observable. Thirdly, monitoring the borrowers/trades that don't match across trades, collections and public records was an important task at hand. I made sure that the borrowers not being matched across datasets were not systematically defaulting on a certain type of trade or were not from a certain region or were not sued for a particular type of public record. I think this took considerable amount of energy and time, but still I think that it was worth the effort. Most PhD applicants might be managing such large datasets. I think my particular skills have been precise data build construction and the ability to discern whether certain patterns in the data are realizations from certain DGPs or they're just anomalous patterns. A lot of follow-up ideas in my mind have come from this.

New DiD Lit and FF Variation: I think dealing with FF variation -- the repeated staggered FF change -- has been a very interesting part of my job. Setting up DiD framework, interpreting coeffs, understanding why exactly TWFE with staggered treatment is biased, how SA/Borusyak et al./de Chaisemartin et al. help get rid of negative weighting, how de Chaisemartin computes SEs using bootstrap and Dube et al.'s estimator and the assumptions underlying it are just some things that I've gained over the past year. I'm given to understand that a very lucky few pre-docs are given the chance to spend time on these and I'm really thankful for that. I think my "best" in this regard would be suggesting to use SA to compute pop. weighted treatment effects. I can expand more on theoretical properties of SA and why it is a better estimation method for our setup and this came with knowing other estimators intimately. In addition, I remain adamant (by intuition) that double trimming still leads to consistent estimates of the population share-weighted treatment effects and I'm working on a proof for that. Another thing that has been sitting in the pavilion is a method I worked out (independently of de Chaisemartin et al. 2022 and Dube et al.) for continuous repeated treatment. I remain the most proud of it and if it helps letter writing, I can share it with you -- promising much better writing, precision and usability. I'm really glad that I do not need to re-invest time in my 3rd year learning all of these things from scratch or taking an applied econometrics course.

Suggesting a variety of cuts in the data When we just started working with the combined ff variation and TU data, we talked about a cohort-specific 'heatmap' idea -- I was already implementing a similar version to study the diff in diff in suit probability over the sample calendar time for each cohort. Further, in our weekly meetings, I like to think that I suggest variety of cuts in the data that might be worth studying/trying to find patterns for which writing dynamic models is simple etc. I would like to be better at it and I learn more about it in every meeting. I dare say that a good number of our meetings act as a validation mechanism for the steps we should take next.
Construction of event study plots -- I think you explained ES Plots really nicely to me and I was very happy to make these plots for the first time ever.
I really liked helping with Bartik-Nelson, specifically because it gave me a chance to do algebraic computation!
I think "my best" should include what I really like thinking about. I still really like proving things and being careful about consistency of estimators (like SA), using the delta method to compute SEs for SA estimator, thinking about incidental parameter problems, etc. I really like bootstrapping because it pushes the econometrician to use independence between entities and over the past year, there have been times when I've suggested it! It might have been completely useless, but I still think I was happy to be on Dan's side in the debate between you and him!

Modelling and MLE: I couldn't do a lot of modeling in my first year, although we have good leads like simulated MLE to make advancements on that front. I think that I would like to be better at this, especially because I've invested a lot of time learning MLE and its properties. I hope to be able to contribute something to this project from that part of my skill set.

Suggestions about consumer default model without bankruptcy: I still think that estimating the probability of transition into the collections state and a transition from the collections state into the sued state across heterogenous borrowers (e.g., hand-to-mouth, middle class, rich borrowers) is an idea worth executing. A model that predicts transition into and out of these states could be an important gateway to understand heterogenous borrower behavior. A lot of the HANK literature would be interested in getting micro-estimates like MPB from such a model.
A model of collection seasoning: Because I'm recollecting everything that's worth mentioning, I choose to document this. As we've talked about in the past-- a model of credit card debt seasoning would be very interesting to look at. I think it is interesting to model a debt-collector's decision to buy/sell unsecured debt. In an ideal world, I would like to write a portfolio choice model of a debt collector and policy levers like FF changes that can be used to regulate them.

That said, I've learnt a great deal about applied practice. I think I can also be much better because most of my skill accrual over the years has been about theory and abstract thinking. But over time, I've come to appreciate research practice a la Nelson (2023), Kaplan, Moll, Violante (2018) and a lot of Erik Hurst's work. Its great to have reduced form evidence and then write an ingenious, parsimonious model that captures a lot of variation in the data. It is worth knowing the economics reasons for the model's inability to capture the "loss".

Thanks for being generous with the analysis course and letting me read macro models, metrics theory and esp. introducing me to Livshits, MacGee and Tertilt's model!

Page007 / rutvikpage

Portfolio version 1 #13