csaftoiu / yahoo-groups-backup

A python script to backup the contents of private Yahoo! groups.
The Unlicense
37 stars 18 forks source link

splinter.exceptions.ElementDoesNotExist: no elements could be found with tag_name "pre" #41

Open changeling opened 7 years ago

changeling commented 7 years ago

I'm running into this issue. Any thoughts?

python3 yahoo-groups-backup.py scrape_messages --login=<my-login> --password=<my-password> <my-group-name>

Processing the log-in page...
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/splinter/element_list.py", line 40, in __getitem__
    return super(ElementList, self).__getitem__(index)
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "yahoo-groups-backup.py", line 129, in <module>
    main()
  File "yahoo-groups-backup.py", line 125, in main
    arguments, cfg_args)
  File "yahoo-groups-backup.py", line 103, in invoke_subcommand
    return module.command(args)
  File "/home/<my-user>/yahoo-groups-backup/yahoo_groups_backup/subcommands/scrape_messages.py", line 41, in command
    last_message = scraper.get_last_message_number()
  File "/home/<my-user>/yahoo-groups-backup/yahoo_groups_backup/scraper.py", line 84, in get_last_message_number
    return self._load_json_url(url)['ygData']['messages'][0]['messageId']
  File "/home/<my-user>/yahoo-groups-backup/yahoo_groups_backup/scraper.py", line 77, in _load_json_url
    return json.loads(self.br.find_by_tag("pre")[0].text)
  File "/usr/local/lib/python3.5/dist-packages/splinter/element_list.py", line 44, in __getitem__
    self.find_by, self.query))
splinter.exceptions.ElementDoesNotExist: no elements could be found with tag_name "pre"
csaftoiu commented 7 years ago

Hmm haven't seen this one. What does the browser window look like at that point in time? It's expecting the JSON data which the browser renders with a pre tag. It was a bit of a hack but it seemed to work.

changeling commented 7 years ago

I'd like to leave the selenium window open on failure in order to check that. Where in the code might I do that?

Oh, also, I was getting some timeouts on username and password, so I upped the sleep() time to 5 on username:

    self.br.find_by_name("signin").click()
    # Wait ...
    time.sleep(5)

and 10 after passwd:

    self.br.find_by_name("signin").click()
    # Wait ...
    time.sleep(10)

to allow for page rendering on my (slow) machine. That eliminated these errors when either the machine or the network lags. (I made the sleep(10) change on the chance that the "pre" error was a timing problem. Doesn't seem to be the case.):

splinter.exceptions.ElementDoesNotExist: no elements could be found

with name "passwd" and splinter.exceptions.ElementDoesNotExist: no elements could be found with name "username"

On Fri, May 12, 2017 at 11:12 AM, Claudiu notifications@github.com wrote:

Hmm haven't seen this one. What does the browser window look like at that point in time? It's expecting the JSON data which the browser renders with a pre tag. It was a bit of a hack but it seemed to work.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/csaftoiu/yahoo-groups-backup/issues/41#issuecomment-301119722, or mute the thread https://github.com/notifications/unsubscribe-auth/AAYd_ljir8FXbx2mPmthzNl41pycpMF3ks5r5ITngaJpZM4NY-qh .

changeling commented 7 years ago

UPDATE: It looks like Yahoo may have changed the code today. I'm now getting:

splinter.exceptions.ElementDoesNotExist: no elements could be found with name "passwd"

no matter what. I'll see if my sleep()s are somehow causing that.

On Fri, May 12, 2017 at 12:50 PM, Chris Larson cklarson@gmail.com wrote:

I'd like to leave the selenium window open on failure in order to check that. Where in the code might I do that?

Oh, also, I was getting some timeouts on username and password, so I upped the sleep() time to 5 on username:

    self.br.find_by_name("signin").click()
    # Wait ...
    time.sleep(5)

and 10 after passwd:

    self.br.find_by_name("signin").click()
    # Wait ...
    time.sleep(10)

to allow for page rendering on my (slow) machine. That eliminated these errors when either the machine or the network lags. (I made the sleep(10) change on the chance that the "pre" error was a timing problem. Doesn't seem to be the case.):

splinter.exceptions.ElementDoesNotExist: no elements could be found

with name "passwd" and splinter.exceptions.ElementDoesNotExist: no elements could be found with name "username"

On Fri, May 12, 2017 at 11:12 AM, Claudiu notifications@github.com wrote:

Hmm haven't seen this one. What does the browser window look like at that point in time? It's expecting the JSON data which the browser renders with a pre tag. It was a bit of a hack but it seemed to work.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/csaftoiu/yahoo-groups-backup/issues/41#issuecomment-301119722, or mute the thread https://github.com/notifications/unsubscribe-auth/AAYd_ljir8FXbx2mPmthzNl41pycpMF3ks5r5ITngaJpZM4NY-qh .

changeling commented 7 years ago

False alarm on that 'UPDATE'. For some reason, changing the sleep(2) to sleep(10) caused it to fail with the passwd error. Changing it back worked.

On Fri, May 12, 2017 at 12:52 PM, Chris Larson cklarson@gmail.com wrote:

UPDATE: It looks like Yahoo may have changed the code today. I'm now getting:

splinter.exceptions.ElementDoesNotExist: no elements could be found with name "passwd"

no matter what. I'll see if my sleep()s are somehow causing that.

On Fri, May 12, 2017 at 12:50 PM, Chris Larson cklarson@gmail.com wrote:

I'd like to leave the selenium window open on failure in order to check that. Where in the code might I do that?

Oh, also, I was getting some timeouts on username and password, so I upped the sleep() time to 5 on username:

    self.br.find_by_name("signin").click()
    # Wait ...
    time.sleep(5)

and 10 after passwd:

    self.br.find_by_name("signin").click()
    # Wait ...
    time.sleep(10)

to allow for page rendering on my (slow) machine. That eliminated these errors when either the machine or the network lags. (I made the sleep(10) change on the chance that the "pre" error was a timing problem. Doesn't seem to be the case.):

splinter.exceptions.ElementDoesNotExist: no elements could be found

with name "passwd" and splinter.exceptions.ElementDoesNotExist: no elements could be found with name "username"

On Fri, May 12, 2017 at 11:12 AM, Claudiu notifications@github.com wrote:

Hmm haven't seen this one. What does the browser window look like at that point in time? It's expecting the JSON data which the browser renders with a pre tag. It was a bit of a hack but it seemed to work.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/csaftoiu/yahoo-groups-backup/issues/41#issuecomment-301119722, or mute the thread https://github.com/notifications/unsubscribe-auth/AAYd_ljir8FXbx2mPmthzNl41pycpMF3ks5r5ITngaJpZM4NY-qh .

csaftoiu commented 7 years ago

Hmm try putting a sleep() or an input() right before the offending line:

  File "/home/<my-user>/yahoo-groups-backup/yahoo_groups_backup/scraper.py", line 77, in _load_json_url
    return json.loads(self.br.find_by_tag("pre")[0].text)

That should leave it open so you can check it out. If you could paste a screenshot here with the inspect console open that'd help. e.g. on Chrome I see this for a JSON document:

image

changeling commented 7 years ago

Got it! Not sure how or where to set a firefox preference in a temporary profile, but here's the problem:

https://developer.mozilla.org/en-US/docs/Tools/JSON_viewer

The relevant config setting is:

devtools.jsonview.enabled

This needs to be set as false for the generated profile. I'm betting that solves the issue.

On Fri, May 12, 2017 at 1:05 PM, Chris Larson cklarson@gmail.com wrote:

False alarm on that 'UPDATE'. For some reason, changing the sleep(2) to sleep(10) caused it to fail with the passwd error. Changing it back worked.

On Fri, May 12, 2017 at 12:52 PM, Chris Larson cklarson@gmail.com wrote:

UPDATE: It looks like Yahoo may have changed the code today. I'm now getting:

splinter.exceptions.ElementDoesNotExist: no elements could be found with name "passwd"

no matter what. I'll see if my sleep()s are somehow causing that.

On Fri, May 12, 2017 at 12:50 PM, Chris Larson cklarson@gmail.com wrote:

I'd like to leave the selenium window open on failure in order to check that. Where in the code might I do that?

Oh, also, I was getting some timeouts on username and password, so I upped the sleep() time to 5 on username:

    self.br.find_by_name("signin").click()
    # Wait ...
    time.sleep(5)

and 10 after passwd:

    self.br.find_by_name("signin").click()
    # Wait ...
    time.sleep(10)

to allow for page rendering on my (slow) machine. That eliminated these errors when either the machine or the network lags. (I made the sleep(10) change on the chance that the "pre" error was a timing problem. Doesn't seem to be the case.):

splinter.exceptions.ElementDoesNotExist: no elements could be found

with name "passwd" and splinter.exceptions.ElementDoesNotExist: no elements could be found with name "username"

On Fri, May 12, 2017 at 11:12 AM, Claudiu notifications@github.com wrote:

Hmm haven't seen this one. What does the browser window look like at that point in time? It's expecting the JSON data which the browser renders with a pre tag. It was a bit of a hack but it seemed to work.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/csaftoiu/yahoo-groups-backup/issues/41#issuecomment-301119722, or mute the thread https://github.com/notifications/unsubscribe-auth/AAYd_ljir8FXbx2mPmthzNl41pycpMF3ks5r5ITngaJpZM4NY-qh .

changeling commented 7 years ago

Note that, according to that link, the JSON View will be enabled in Firefox starting with v53, so I suspect you'll be hearing more of this. :)

On Fri, May 12, 2017 at 1:25 PM, changeling notifications@github.com wrote:

Got it! Not sure how or where to set a firefox preference in a temporary profile, but here's the problem:

https://developer.mozilla.org/en-US/docs/Tools/JSON_viewer

The relevant config setting is:

devtools.jsonview.enabled

This needs to be set as false for the generated profile. I'm betting that solves the issue.

On Fri, May 12, 2017 at 1:05 PM, Chris Larson cklarson@gmail.com wrote:

False alarm on that 'UPDATE'. For some reason, changing the sleep(2) to sleep(10) caused it to fail with the passwd error. Changing it back worked.

On Fri, May 12, 2017 at 12:52 PM, Chris Larson cklarson@gmail.com wrote:

UPDATE: It looks like Yahoo may have changed the code today. I'm now getting:

splinter.exceptions.ElementDoesNotExist: no elements could be found with name "passwd"

no matter what. I'll see if my sleep()s are somehow causing that.

On Fri, May 12, 2017 at 12:50 PM, Chris Larson cklarson@gmail.com wrote:

I'd like to leave the selenium window open on failure in order to check that. Where in the code might I do that?

Oh, also, I was getting some timeouts on username and password, so I upped the sleep() time to 5 on username:

self.br.find_by_name("signin").click()

Wait ...

time.sleep(5)

and 10 after passwd:

self.br.find_by_name("signin").click()

Wait ...

time.sleep(10)

to allow for page rendering on my (slow) machine. That eliminated these errors when either the machine or the network lags. (I made the sleep(10) change on the chance that the "pre" error was a timing problem. Doesn't seem to be the case.):

splinter.exceptions.ElementDoesNotExist: no elements could be found with name "passwd" and splinter.exceptions.ElementDoesNotExist: no elements could be found with name "username"

On Fri, May 12, 2017 at 11:12 AM, Claudiu notifications@github.com wrote:

Hmm haven't seen this one. What does the browser window look like at that point in time? It's expecting the JSON data which the browser renders with a pre tag. It was a bit of a hack but it seemed to work.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/csaftoiu/yahoo-groups-backup/issues/41# issuecomment-301119722, or mute the thread https://github.com/notifications/unsubscribe-auth/AAYd_ ljir8FXbx2mPmthzNl41pycpMF3ks5r5ITngaJpZM4NY-qh .

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/csaftoiu/yahoo-groups-backup/issues/41#issuecomment-301151188, or mute the thread https://github.com/notifications/unsubscribe-auth/AAYd_kpLdbuvMpFpIbr8lckUXzgnIUrjks5r5KQmgaJpZM4NY-qh .

csaftoiu commented 7 years ago

Ooh, that's unfortunate. That's the thing about hacks, they only work for so long . . . this will indeed have to be fixed in the code. I probably won't get to it any time soon unfortunately. Thanks for reporting the issue and identifying the cause though, that will make a fix possible :).

changeling commented 7 years ago

Don't really have time right now to dig in, but if I get a chance, I'll let you know.

It looks like you may be able to set the preference via:

http://splinter.readthedocs.io/en/latest/drivers/firefox.html#how-to-use-selenium-capabilities-for-firefox

using:

https://seleniumhq.github.io/selenium/docs/api/py/webdriver_firefox/selenium.webdriver.firefox.options.html#module-selenium.webdriver.firefox.options

Using this to set the 'devtools.jsonview.enabled' preference to false would likely keep your hack working fine.

Also, I saw this little snippet, and change it to the desired preference setting, if that helps at all:

import os

from selenium import webdriver

fp = webdriver.FirefoxProfile() fp.set_preference("devtools.jsonview.enabled",False)

browser = webdriver.Firefox(firefox_profile=fp)

Thanks for what looks like an amazing script! I've been struggling with Yahoo Groups for awhile now. As I said, if I have more time, I'll try to dig in.

Chris

On Fri, May 12, 2017 at 2:06 PM, Claudiu notifications@github.com wrote:

Ooh, that's unfortunate. That's the thing about hacks, they only work for so long . . . this will indeed have to be fixed in the code. I probably won't get to it any time soon unfortunately. Thanks for reporting the issue and identifying the cause though, that will make a fix possible :).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/csaftoiu/yahoo-groups-backup/issues/41#issuecomment-301160665, or mute the thread https://github.com/notifications/unsubscribe-auth/AAYd_pH0bggHVP6jKIRs_2_kmpK4wtaeks5r5K2hgaJpZM4NY-qh .

jonbartlett commented 6 years ago

I get the same error and spent a bit of time trying to fix but to no avail.

Inspired by this project a while back I started on a Ruby version which I have just finished given I can't get this one working. If anyone is interested see https://github.com/jonbartlett/yahoo-groups-export

changeling commented 6 years ago

@jonbartlett Are you looking at adding photo export, too?

jonbartlett commented 6 years ago

@changeling Photos attached to posts? If so, possibly but there are so few in the forum I am migrating it probably isn't a priority.

If you want to get involved find a post with a photo and see how it is represented through the API:

https://groups.yahoo.com/api/v1/groups//messages/4/raw

Also better if we move this conversation over to my repo.

JustinCEO commented 4 years ago

i encountered the error described in the OP while passing --driver=chrome to the script to get around the issue described in this thread https://github.com/csaftoiu/yahoo-groups-backup/issues/47#issuecomment-417490160 the script ran for a bit but then produced the "splinter.exceptions.ElementDoesNotExist: no elements could be found with tag_name "pre" message

hrenfroe commented 4 years ago

@JustinCEO If you're still seeing this on my fork, can you open an issue there with your stack traces?

peterhost commented 4 years ago

@hrenfroe I did run into the issue. The problem was a Yahoo Splash screen which prevented the login process to initiate. I just increased the time.sleep() durations to 5 seconds in the yahoo-groups-backup/scrapper.py function _process_login_page, which gives me the time to click "ok" on the spash screen, then the login goes on smoothly.

peterhost commented 4 years ago

Also, for the record if anybody stumbles upon this (as we're all in a hurry to backup our groups). One of the groups I had to backup is big, more than 50k posts. Nodejs runs in a memory heap problem when stringifying the jsonp loaded in memory before splitting it into the data.messagedata-xxx-xxx.js files. The quick fix is to add an argument to dump_site.py in the subcommands dir to increase the memory for the nodejs's V8 thread : --max_old_space_size=4096 sufficed for me.

 def render_search_indices(self):
     subprocess.Popen([
         "node", "--max_old_space_size=4096", P.join(P.dirname(P.realpath(__file__)), 'generate_search_index.js'),
         P.join(self.data_dir)
     ]).communicate()