Open chrisrodley opened 10 years ago
I've written gutengrep
to grep full sentences (not lines) from a corpus of plain text files using regexes. You might find it handy.
Thanks so much! That looks super helpful.
I love this project! And it sounds really impressive to me (also a coding beginner).
It sounds really impressive to me, and by really impressive I also mean really difficult. So no matter how it turns out, kudos for even making the attempt.
To expand on what I mean: this year, this and this and this and probably others that I've forgotten, have brought to mind for me William S. Burroughs' stance that in order to do creative writing, you first need to learn creative reading.
And here we all are using these machines that can barely^2 handle uncreative reading.
The hard part's not printing out the sentences, it's writing the function sounds_gay(s: string): bool
!
That might have to wait until we have gay robots @cpressey! I am just doing a first cut with some very simple search terms based on suggestive vocabulary.
Any updates?
Almost finished! :)
Late but better than never! My project has the working title "Every Gay Sentence In English Literature" (Another title I'm playing with is: "Moscowitz Kissed Him", one of the sentences from the work.)
The idea is to search and tabulate every conceivably homoerotic line from classic English literature. My inspiration is the way that, before the rise of gay lit in the 1970s, queer people -- hungry for the barest glimpse of self-representation -- would keep a constant eye out in the books they read for lines that could, remotely, in any way, be regarded as homoerotic.
Currently I am looking at a corpus of Project Gutenberg, potentially supplemented by Google Books.
But I'm an absolute coding beginner! So to get this done in time, I'm going to be sourcing the text via manual site-constrained searches, dumping it into an Excel spreadsheet, and performing some simple operations. Once I have a 50k proof-of-concept, I plan to use Python to create a much more comprehensive version which will probably be several times as long, because it will search for a much wider range of hits.