Closed luzer closed 8 years ago
Hey @luzer, glad you found it useful! Just curious: what are you using the script data for?
I checked an old sqlite db I had generated, and the problem you identified is there so this appear to have always been a bug. Thanks for creating the issue! Did you find a fix for this? I can take a look, but probably not in the next week.
@colinpollock love this repo! i didnt find a fix yet, but went back in time to see if i could find a 'clean' HTML file for this- but could not. something with the HTML is malformed. https://web.archive.org/web/*/www.seinfeldscripts.com i migrated it to Postgres and am doing sentiment analysis with it, combined with GOIM (https://github.com/BurntSushi/goim), putting it into Tableau.
see details here (http://www.tableaumeaway.com/seinfeld-sentiment-analysis-tableau-v10/)
i tried to tweet you, but but have gotten the wrong colin pollock...
i saw a similar project from @mattniedelman https://github.com/mattniedelman/seinfeld/blob/master/scraper.ipynb that is using a different script file- that might work - http://www.seinology.com/scripts/script-142.shtml vs http://www.seinfeldscripts.com/TheChickenRoaster.htm (which does not even render )
thanks! any way i can just load the missing ep?
[love your work!]
still researching the cause... might be related to the fact that this episode was 'corrected'
in file 142.shtml, after parsing, only 6 lines of data appears
select * from sentence s, utterance u, episode e where s.utterance_id = u.id and u.episode_id = e.id and season_number = 8 and episode_number = 8;