Ackater / writing.com-archival

Utility for downloading Interactives from Writing.com
https://ackater.github.io/writing.com-archival
23 stars 3 forks source link

XPaths are wrong #32

Open SomeRandomDude870 opened 3 years ago

SomeRandomDude870 commented 3 years ago

Hi, it seems that some Queries are wrong. I have already looked through them and tried to fix them but I still get errors. I just have no idea of Python.

Anybody can look through them and fix it?

Ackater commented 3 years ago

Sorry I have been busy, please try now.

MegaMagicPower commented 3 years ago

It still wasn't working for me, but I figured out the xpath values. For some reason, when you load the page in Chrome, the xpath values in the code are correct. But if you inspect what's brought back inside of the python code, the html is totally different. No idea why. Anyway, replace these lines in scraper.py with the below and it should work again.

chapter_title_xp                = "//div/h2[starts-with(@title, 'Created')]/text()"
chapter_content_xp              = ".//div[@style='padding:25px 5px 5px 10px;min-width:482px;']/div"
chapter_choices_xp              = ".//div[@id='end_the_choices']/parent::*//a"

Also I've never been able to get the "download search results" feature to work until now either, so here's that change as well, also in scraper.py:

search_results_xp="//div[@style='display:inline;']"

#Located in get_search_page_interactive_ids()
items.append(re.findall(r"interactive-story/item_id/(.+?)'" , list(l.getchildren())[0].attrib['oncontextmenu'])[0])
Ackater commented 3 years ago

Do you want to submit a pull request? Just don't grab the gh-pages branch which is about 1GB of archives now.

The search function was leftover from the old code. I never bothered to get it working, since I just downloaded everything.

Ackater commented 3 years ago

Actually are you using a theme that's not the default theme on writing.com? That may be why your xpaths are different. I'm checking mine, and the ones xpaths work fine.

MegaMagicPower commented 3 years ago

Yeah it looks like you're right. I just made a brand new account and it looks a little different; the "Members who added to this interactive story also contributed to these" is on the right side, whereas on my actual account, they're on the left. And at the bottom, the choices are left aligned on the new account, and centered on the main account. The xpaths work correctly on the new account as well. I didn't even know there was a way to change your theme until you mentioned it, but I just looked and they're both set to "Writing.com: Default". The only thing I can think of is someone anonymously gifted me a membership, and the pages delivered to members are different than free accounts.

I can't really deliver a PR when working with totally different output pages, but you might see if what I posted for the search page works for you as well and deliver that if you want, since its such a small change. The only thing I changed was "interact-story" to "interactive-story" on that line I posted.

Ackater commented 3 years ago

Oh, do you have the (paid membership) feature to automatically expand chapters in interactives on? That'll change the layout of the pages too.

MegaMagicPower commented 3 years ago

I did have it on, but turning it off didn't change anything as far as the xpath problem.

Ackater commented 3 years ago

It might be still enabled in the scraper's session if you don't repeat your login. Try deleting your session file with it turned off on the website?

I'm using a premium account to scrape myself, with that feature turned off.

MegaMagicPower commented 3 years ago

Yeah I think that was it. It seems to be working fine now with the default xpaths. Wish I would have asked earlier and saved myself a couple of hours.

Since I already did the work though I went ahead and made a pull request for the get_search fix. I tested it on a free account so it should be working despite any weird hangups that might be left with my main account. I updated the instructions as well since it seems like the default search no longer includes interactives; you have to specifically request them.

SomeRandomDude870 commented 3 years ago

Thanks! Works now.

Although, it´s quite strange since I have looked through the XPAths myself and they didn´t change.