dorothealint / Rick-and-Morty

Digital Humanities project marking up and analyzing data from Rick and Morty
The Unlicense
1 stars 1 forks source link

Season 2 Scripts (shortcut?) #9

Open dorothealint opened 6 years ago

dorothealint commented 6 years ago

Since you have a huge start on season 1 already, I committed a huge xml file of all the season 2 transcripts. This is just a working file out of the dank recesses of dorothea's working mind for you to take a look at. I'm trying to save time and energy so I copied all the season 2 transcripts that are available into one file and started regex on them. Then I'm going to separate them into each episode after all the regex is done. That way everything only has to be typed once in the find replace. I'm going to go back in and fix the ones that are bad before I do any more markup but I wanted to show you how I'm thinking of doing things to see what you thought of it. I pulled all of season 3 into another file as well but haven't transferred that one to xml yet.

helvitiis commented 6 years ago

Hey! We can actually use XSLT to run our regex over a collection of files, but this way works too. It's all up to you which you prefer... Don't forget that this way will also make it trickier and longer to associate the schema and go through every single line :-)

One more thing--I noticed you didn't enclose every episode in its own <script> tag. It may be a bit tricker finding regex that will fit every single episode's transcript (as I see many of them have inconsistent ways of transcribing the text). Either way, you're thinking along the right path!!