Open DamonCharlesRoberts opened 1 year ago
For the transcripts where we need to split them, what information exactly is needed? For example, the transcript “bryant_09-26-2006” includes the hearing for Vanessa Bryant and Michael Wallace. But the way the hearing is organized, Bryant and Wallace are presented, then give statements, then there’s witness statements, then the written question and answers. So for this one, do I just need to pull the section for Bryant from “statements of the nominees” or is other information needed?
Hmmm, good question. Some of these are a lot more messy than I had realized in the past. So I think this is forcing us to make a decision here.
In terms of interruptions, we ONLY want the text from the transcripts of the hearing -- what was said in real-time during the hearing.
I do think, that there is something interesting to be gleaned from the written remarks and the questions they were asked in regards to our second question in the project which is, "Is the topic of the questions and answers between nominees based on their gender and racial/ethnic identity different?" I think that we could get some useful insights there when considering the full text -- which I am realizing for a lot of these we had used in the past.
So, my initial thoughts are to just keep everything. Our model for interpretations doesn't standardize, it just takes raw counts of interruptions -- which won't happen in the written statements -- so it won't influence our results there. But if we keep the stuff that comes with the transcripts, then we can say that the confirmation process is different between male and female and non-POC and POC nominees, not just the hearings. So for our discussion of the topic models and stuff would need to be broadened to the whole confirmation process rather than just the hearing, but yeah.
What say you @madelinemader and @tylerpgarrett?
But is the relevant portion only the spoken statements for the nominees? In the above example, there are two nominees, and both of their spoken statements come before the written statements and before spoken witness statements but after spoken statements from the senators. So for the purposes of splitting the transcripts for the two nominees in this session, should i just pull the spoken statements of the nominees portion?
Yeah, so the thing that is most relevant are the spoken statements. So if we can't split things up very cleanly between nominees, we should focus on splitting on what we can in the spoken statements as best as we can