thoughts on filtering - Githubissues

datLucius commented 8 years ago

Just wanted to point out that I filtered out the meta lyric strings like '[Verse 1]' ,' x2' Also, when I built up my arrays from the wu-tang songs, I ran through and manually cut down on some words that are repeated a ton of times (I, me, the, etc). Prob just a matter of writing some js that finds the most common words and splitting about half of them off.

Not sure about how the api returns lyrics but this is what I was running on the genius page to generate an ipsum:

WillCMcC commented 8 years ago

Sweet, thank you. I meant to create an issue about this last night but I wasn't sure if it was too soon. As it stands right now I just copy / pasted your lists directly into the project here, and kept the underlying logic the same despite the implementation being a lil different.

Right now here are my thoughts for where we can take this--

V1: Turn 'Full' into 3 different lists: List 1 : Sentence Starters -- any word that starts with a capital letter List 2: Sentence Enders -- anything with punctuation List 3: Middle words -- everything not on list 1 or 2

From there, it will be easy to generate better looking sentences -- grab a random starter word, grab a bunch of middle words, grab a random ending word. I can implement stuff to build sentences until we have the right number of words, I've got it all in my head I just need to write the code once I'm off work.

V2: randomly generate verses of length n (instead of just plain paragraphs) -- I haven't even started thinking about how to do this yet. First reactions are markov chaining, or finding some way to parse the initial scraping so we have a better idea of verse structure (keeping some data structure that knows which Sentence Enders appear in sequence so that we can build up random sentences that rhyme (in theory))

I'd love to hear your thoughts on any of this. I'll also mess around with the API to see what I can get out of that.

WillCMcC commented 8 years ago

It's really helpful to see how you generated the lists, thanks for posting that!

WillCMcC / wutangloremupsum

thoughts on filtering #1