MrTyton / Fanfiction-Recommendation

3 stars 0 forks source link

Is there a way to scrape the stories themselves? #2

Open depthfirst opened 8 years ago

depthfirst commented 8 years ago

Working on topic modeling with LDA, and I'd like to try it on a sample of the stories, not just the summaries. Do you have code to do that, or can you write a function to return the text of story given ?

MrTyton commented 8 years ago

I don't have code to do that. I'll see if I can write something up for it for tomorrow; otherwise you could try using http://www.mobileread.com/forums/showthread.php?t=259221

On Sat, Nov 21, 2015 at 9:30 PM John Blackmore notifications@github.com wrote:

Working on topic modeling with LDA, and I'd like to try it on a sample of the stories, not just the summaries. Do you have code to do that, or can you write a function to return the text of story given ?

— Reply to this email directly or view it on GitHub https://github.com/MrTyton/Fanfiction/issues/2.

MrTyton commented 8 years ago

pip install FanFicFare

On Tue, Nov 24, 2015 at 11:03 PM John Blackmore notifications@github.com wrote:

Assigned #2 https://github.com/MrTyton/Fanfiction/issues/2 to @MrTyton https://github.com/MrTyton.

— Reply to this email directly or view it on GitHub https://github.com/MrTyton/Fanfiction/issues/2#event-474086724.

depthfirst commented 8 years ago

Can you give me a little more, like the function I'm asking for? I just thought this would be a lot easier for you since you scraped everything else.

MrTyton commented 8 years ago

Sorry the meds that I'm taking are fucking me up some. That plugin does all the nice epub formatting and stuff and scrapes it all. Give me half an hour, I'll have a basic thing, but it'll still have the formatting tags within the story.

On Wed, Nov 25, 2015 at 7:29 AM John Blackmore notifications@github.com wrote:

Can you give me a little more, like the function I'm asking for? I just thought this would be a lot easier for you since you scraped everything else.

— Reply to this email directly or view it on GitHub https://github.com/MrTyton/Fanfiction/issues/2#issuecomment-159593155.

MrTyton commented 8 years ago

Done. Have to do it from the story class, do you want me to rewrite the init function so that you can just make it from the sql row instead of just getting it from the initial scrape?

On Wed, Nov 25, 2015 at 7:36 AM Joshua Gang joshua.gang@gmail.com wrote:

Sorry the meds that I'm taking are fucking me up some. That plugin does all the nice epub formatting and stuff and scrapes it all. Give me half an hour, I'll have a basic thing, but it'll still have the formatting tags within the story.

On Wed, Nov 25, 2015 at 7:29 AM John Blackmore notifications@github.com wrote:

Can you give me a little more, like the function I'm asking for? I just thought this would be a lot easier for you since you scraped everything else.

— Reply to this email directly or view it on GitHub https://github.com/MrTyton/Fanfiction/issues/2#issuecomment-159593155.

depthfirst commented 8 years ago

Does the init take a bit of time? If so, then yes, that'd be great. Thanks.

MrTyton commented 8 years ago

Init doesn't take time, but right now it literally only works when you give it an XML document. Hold on...

On Wed, Nov 25, 2015 at 8:14 AM John Blackmore notifications@github.com wrote:

Does the init take a bit of time? If so, then yes, that'd be great. Thanks.

— Reply to this email directly or view it on GitHub https://github.com/MrTyton/Fanfiction/issues/2#issuecomment-159603727.

MrTyton commented 8 years ago

Done, works with the sql row from stories, just make sure that you expand it [Story(*row)]

On Wed, Nov 25, 2015 at 8:16 AM Joshua Gang joshua.gang@gmail.com wrote:

Init doesn't take time, but right now it literally only works when you give it an XML document. Hold on...

On Wed, Nov 25, 2015 at 8:14 AM John Blackmore notifications@github.com wrote:

Does the init take a bit of time? If so, then yes, that'd be great. Thanks.

— Reply to this email directly or view it on GitHub https://github.com/MrTyton/Fanfiction/issues/2#issuecomment-159603727.