freeseek / getmyancestors

33 stars 45 forks source link

Max retries exceeded... Caused by NewConnectionError #5

Closed nick1001 closed 8 years ago

nick1001 commented 8 years ago

When I try pulling large GEDs, 100% of the time I get different errors at random times... sometimes the log file is 50k and sometimes it's 4mb. Here's what my most recent error... maybe it just needs a resume function?

FamilySearch ERROR.txt

$ python3.4 getmyancestors.py -a 10 -d 10 -u ***** -p ***** -i LVF5-WPQ -o A.ged" -l a.log -v
Traceback (most recent call last):
  File "/usr/local/lib/python3.4/site-packages/requests/packages/urllib3/connection.py", line 135, in _new_conn
    (self.host, self.port), self.timeout, **extra_kw)
  File "/usr/local/lib/python3.4/site-packages/requests/packages/urllib3/util/connection.py", line 66, in create_connection
    for res in socket.getaddrinfo(host, port, 0, socket.SOCK_STREAM):
  File "/usr/local/lib/python3.4/socket.py", line 530, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 559, in urlopen
    body=body, headers=headers)
  File "/usr/local/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 345, in _make_request
    self._validate_conn(conn)
  File "/usr/local/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 782, in _validate_conn
    conn.connect()
  File "/usr/local/lib/python3.4/site-packages/requests/packages/urllib3/connection.py", line 215, in connect
    conn = self._new_conn()
  File "/usr/local/lib/python3.4/site-packages/requests/packages/urllib3/connection.py", line 144, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
requests.packages.urllib3.exceptions.NewConnectionError: <requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f3d1ac987f0>: Failed to establish a new connection: [Errno -2] Name or service not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.4/site-packages/requests/adapters.py", line 370, in send
    timeout=timeout
  File "/usr/local/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 609, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/usr/local/lib/python3.4/site-packages/requests/packages/urllib3/util/retry.py", line 271, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
requests.packages.urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='familysearch.org', port=443): Max retries exceeded with url: /platform/tree/persons/KCQ9-5C5.json (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f3d1ac987f0>: Failed to establish a new connection: [Errno -2] Name or service not known',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ma/Dropbox/MaAndZee/getmyancestors/getmyancestors.py", line 57, in get_url
    r = requests.get(url, cookies = { 'fssessionid' : self.fssessionid })
  File "/usr/local/lib/python3.4/site-packages/requests/api.py", line 69, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/local/lib/python3.4/site-packages/requests/api.py", line 50, in request
    response = session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python3.4/site-packages/requests/sessions.py", line 468, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.4/site-packages/requests/sessions.py", line 576, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.4/site-packages/requests/adapters.py", line 423, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='familysearch.org', port=443): Max retries exceeded with url: /platform/tree/persons/KCQ9-5C5.json (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f3d1ac987f0>: Failed to establish a new connection: [Errno -2] Name or service not known',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ma/Dropbox/MaAndZee/getmyancestors/getmyancestors.py", line 428, in <module>
    for child in get_family(fid):
  File "/home/ma/Dropbox/MaAndZee/getmyancestors/getmyancestors.py", line 346, in get_family
    process_trio(father,mother,child)
  File "/home/ma/Dropbox/MaAndZee/getmyancestors/getmyancestors.py", line 308, in process_trio
    tree.indi[child] = Indi(child)
  File "/home/ma/Dropbox/MaAndZee/getmyancestors/getmyancestors.py", line 113, in __init__
    data = fs.get_url(url)
  File "/home/ma/Dropbox/MaAndZee/getmyancestors/getmyancestors.py", line 68, in get_url
    if 'reason' in e.args[0]:
TypeError: argument of type 'MaxRetryError' is not iterable
freeseek commented 8 years ago

I should be able to get this fixed. What were the last lines in your a.log file?

nick1001 commented 8 years ago

I didn't save the log... however, here's another one.

FamilySearch ERROR 2.txt

    $ python3.4 getmyancestors.py -a 10 -d 10 -u ***** -p ***** -i LVF5-WPQ -o A.ged" -l a.log -v

    Traceback (most recent call last):
      File "/usr/local/lib/python3.4/site-packages/requests/packages/urllib3/connection.py", line 135, in _new_conn
        (self.host, self.port), self.timeout, **extra_kw)
      File "/usr/local/lib/python3.4/site-packages/requests/packages/urllib3/util/connection.py", line 90, in create_connection
        raise err
      File "/usr/local/lib/python3.4/site-packages/requests/packages/urllib3/util/connection.py", line 80, in create_connection
        sock.connect(sa)
    TimeoutError: [Errno 110] Connection timed out

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "/usr/local/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 559, in urlopen
        body=body, headers=headers)
      File "/usr/local/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 345, in _make_request
        self._validate_conn(conn)
      File "/usr/local/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 782, in _validate_conn
        conn.connect()
      File "/usr/local/lib/python3.4/site-packages/requests/packages/urllib3/connection.py", line 215, in connect
        conn = self._new_conn()
      File "/usr/local/lib/python3.4/site-packages/requests/packages/urllib3/connection.py", line 144, in _new_conn
        self, "Failed to establish a new connection: %s" % e)
    requests.packages.urllib3.exceptions.NewConnectionError: <requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7fee208b6160>: Failed to establish a new connection: [Errno 110] Connection timed out

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "/usr/local/lib/python3.4/site-packages/requests/adapters.py", line 370, in send
        timeout=timeout
      File "/usr/local/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 609, in urlopen
        _stacktrace=sys.exc_info()[2])
      File "/usr/local/lib/python3.4/site-packages/requests/packages/urllib3/util/retry.py", line 271, in increment
        raise MaxRetryError(_pool, url, error or ResponseError(cause))
    requests.packages.urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='familysearch.org', port=443): Max retries exceeded with url: /platform/tree/persons/LHLL-C2Y/spouses.json (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7fee208b6160>: Failed to establish a new connection: [Errno 110] Connection timed out',))

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "/home/ma/Dropbox/MaAndZee/getmyancestors/getmyancestors.py", line 57, in get_url
        r = requests.get(url, cookies = { 'fssessionid' : self.fssessionid })
      File "/usr/local/lib/python3.4/site-packages/requests/api.py", line 69, in get
        return request('get', url, params=params, **kwargs)
      File "/usr/local/lib/python3.4/site-packages/requests/api.py", line 50, in request
        response = session.request(method=method, url=url, **kwargs)
      File "/usr/local/lib/python3.4/site-packages/requests/sessions.py", line 468, in request
        resp = self.send(prep, **send_kwargs)
      File "/usr/local/lib/python3.4/site-packages/requests/sessions.py", line 576, in send
        r = adapter.send(request, **kwargs)
      File "/usr/local/lib/python3.4/site-packages/requests/adapters.py", line 423, in send
        raise ConnectionError(e, request=request)
    requests.exceptions.ConnectionError: HTTPSConnectionPool(host='familysearch.org', port=443): Max retries exceeded with url: /platform/tree/persons/LHLL-C2Y/spouses.json (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7fee208b6160>: Failed to establish a new connection: [Errno 110] Connection timed out',))

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "/home/ma/Dropbox/MaAndZee/getmyancestors/getmyancestors.py", line 433, in <module>
        rels = tree.indi[child].get_spouses()
      File "/home/ma/Dropbox/MaAndZee/getmyancestors/getmyancestors.py", line 174, in get_spouses
        data = fs.get_url(url)
      File "/home/ma/Dropbox/MaAndZee/getmyancestors/getmyancestors.py", line 68, in get_url
        if 'reason' in e.args[0]:
    TypeError: argument of type 'MaxRetryError' is not iterable

and here are the last 12 lines in that log... FamilySearch ERROR 2 log.txt

Status code: 200
Downloading: https://familysearch.org/platform/tree/persons/LZXR-JNL.json
Status code: 200
Downloading: https://familysearch.org/platform/tree/persons/27DS-XL1/spouses.json
Status code: 200
Downloading: https://familysearch.org/platform/tree/couple-relationships/MDRQ-LHS.json
Status code: 200
Downloading: https://familysearch.org/platform/tree/persons/L7JK-13X.json
Status code: 200
Downloading: https://familysearch.org/platform/tree/persons/K63J-ZZL/spouses.json
Status code: 204
Downloading: https://familysearch.org/platform/tree/persons/LHLL-C2Y/spouses.json
freeseek commented 8 years ago

Hmmm. I though status code 204 was only provided for parents.json requests. Anyway, I improved the error handling of the script. Give it a try using the new version and if you get more errors post the message and the last few lines of the log file.

nick1001 commented 8 years ago
Downloading: https://familysearch.org/platform/tree/persons/L7FS-PKC/spouses.json
Status code: 204
Downloading: https://familysearch.org/platform/tree/persons/L7FS-PXW/spouses.json
Status code: 200
Downloading: https://familysearch.org/platform/tree/couple-relationships/M8Z4-XKM.json
Status code: 200
Downloading: https://familysearch.org/platform/tree/persons/L7FS-52K.json
Status code: 200
Downloading: https://familysearch.org/platform/tree/persons/97SX-T5Z/spouses.json
Connection aborted

FamilySearch ERROR 4.txt

nick1001 commented 8 years ago
    Downloading: https://familysearch.org/platform/tree/persons/LX3X-D45.json
    Status code: 200
    Downloading: https://familysearch.org/platform/tree/persons/LK1B-3YS.json
    Status code: 200
    Downloading: https://familysearch.org/platform/tree/persons/KHCV-2S5.json
    Status code: 200
    Downloading: https://familysearch.org/platform/tree/persons/9HTP-VDW.json
    Connection aborted

FamilySearch ERROR 5.txt

freeseek commented 8 years ago

Okay, I had forgot something. The new version should not fail even if your internet connection drops while the script is running.

tmathie commented 8 years ago

Hello, First, i would like to thank you for this tool. Trying to download these pedigree files from Genealogy Software has been heartbreaking, this is much better, especially because of the log file. I have one question, and i believe it has more to do with the website than you tool, but maybe you have an idea. Alot of the time the search will get stuck trying to download one part of the tree, and it will keep trying indefinitely until i kill the process. Here is one example, this one i looked up in the website and it had a "Person has been deleted" status, which is why, i am sure, it got stuck:

Downloading: https://familysearch.org/platform/tree/persons/LH2K-4PF/spouses.json Status code: 410 Unexpected error Downloading: https://familysearch.org/platform/tree/persons/LH2K-4PF/spouses.json Status code: 410 Unexpected error Downloading: https://familysearch.org/platform/tree/persons/LH2K-4PF/spouses.json Status code: 410 Unexpected error Downloading: https://familysearch.org/platform/tree/persons/LH2K-4PF/spouses.json Status code: 410 Unexpected error Downloading: https://familysearch.org/platform/tree/persons/LH2K-4PF/spouses.json Status code: 410 Unexpected error Downloading: https://familysearch.org/platform/tree/persons/LH2K-4PF/spouses.json Status code: 410 Unexpected error Downloading: https://familysearch.org/platform/tree/persons/LH2K-4PF/spouses.json Status code: 410 Unexpected error Downloading: https://familysearch.org/platform/tree/persons/LH2K-4PF/spouses.json Status code: 410 Unexpected error Downloading: https://familysearch.org/platform/tree/persons/LH2K-4PF/spouses.json Status code: 410 Unexpected error Downloading: https://familysearch.org/platform/tree/persons/LH2K-4PF/spouses.json Status code: 410 Unexpected error Downloading: https://familysearch.org/platform/tree/persons/LH2K-4PF/spouses.json Status code: 410 Unexpected error Downloading: https://familysearch.org/platform/tree/persons/LH2K-4PF/spouses.json If you have any ideas, i would appreciate it. Thank you,

freeseek commented 8 years ago

Thank you for pointing this out. I have added a check for status code 410 which now is handled similarly to status code 204. Give a try to the latest revision and see if this works instead.

tmathie commented 8 years ago

Awesome, Thank you!

tmathie commented 8 years ago

Hello, I am sorry to be a compainer, but now i am getting this error, there is no error in the log, only in the console, i believe it is after all of the downloads are complete and it attempts to compile the list. I ran the search twice and recieved the same error both times, any ideas?

torrey@gameover1:~/Desktop/Genealogy/getmyancestors-master$ ./getmyancestors.py -a 20 -d 0 -u uname -p Pword -i khby-b25 -o john-truman.ged -l john-truman.log -v Traceback (most recent call last): File "./getmyancestors.py", line 417, in for child in get_family(fid): File "./getmyancestors.py", line 346, in get_family process_trio(father,mother,child) File "./getmyancestors.py", line 306, in process_trio tree.indi[mother] = Indi(mother) File "./getmyancestors.py", line 117, in init for y in x['names'][0]['nameForms'][0]['parts']: KeyError: 'parts' torrey@gameover1:~/Desktop/Genealogy/getmyancestors-master$ ./getmyancestors.py -a 20 -d 0 -u Uname -p Pword -i khby-b25 -o john-truman.ged -l john-truman.log -v Traceback (most recent call last): File "./getmyancestors.py", line 419, in for child in get_family(fid): File "./getmyancestors.py", line 347, in get_family process_trio(father,mother,child) File "./getmyancestors.py", line 307, in process_trio tree.indi[mother] = Indi(mother) File "./getmyancestors.py", line 118, in init for y in x['names'][0]['nameForms'][0]['parts']: KeyError: 'parts'

Thank you,

freeseek commented 8 years ago

Thank you for reporting this issue. As you can see, you are one of the first beta testers. I will get a fix for this as soon as I identify the problem.

tmathie commented 8 years ago

Thank you, I have ran several other searches without issue, my guess is a circular relationship somewhere in that tree, but i cant tell from the errors.

freeseek commented 8 years ago

The code should be resistant to circular relationships. I gave it a try now and I observed a "Read timed out" followed by a "Connection aborted", followed by a "Status code: 401". It means that the server has closed the session on his side. I had never seen this as a possibility. I will have to fix it and then get your script to run. It might take some time. Thank you for your patience.

tmathie commented 8 years ago

Hello, I got the same error on a different search, it appears to be happening on the larger ones, i ran several prior to this that had less than 400 individuals and they were successful, this one ran all night, so it must be pretty large, torrey@gameover1:~/Desktop/Genealogy/getmyancestors-master$ ./getmyancestors.py -a 99 -d 0 -u uname -p pword -i l44h-yx3 -o JohnMasters.ged -l JohnMasters99.log -vTraceback (most recent call last): File "./getmyancestors.py", line 419, in for child in get_family(fid): File "./getmyancestors.py", line 347, in get_family process_trio(father,mother,child) File "./getmyancestors.py", line 307, in process_trio tree.indi[mother] = Indi(mother) File "./getmyancestors.py", line 118, in init for y in x['names'][0]['nameForms'][0]['parts']: KeyError: 'parts'

tmathie commented 8 years ago

Here is the exact Command LIne: orrey@gameover1:~/Desktop/Genealogy/getmyancestors-master$ ./getmyancestors.py -a 99 -d 0 -u uname -p pword -i l44h-yx3 -o JohnMasters.ged -l JohnMasters99.log -v

Here is the first few lines of the log: ownloading: https://familysearch.org/platform/tree/persons/l44h-yx3.json Status code: 200 Downloading: https://familysearch.org/platform/tree/persons/l44h-yx3/parents.json Status code: 200 Downloading: https://familysearch.org/platform/tree/persons/LZRS-PMK.json Status code: 200 Downloading: https://familysearch.org/platform/tree/persons/LZ6P-N5J.json Status code: 200 Downloading: https://familysearch.org/platform/tree/persons/LZRS-PMK/spouses.json Status code: 200 Downloading: https://familysearch.org/platform/tree/couple-relationships/M6S1-X45.json Status code: 200 Downloading: https://familysearch.org/platform/tree/persons/LHT4-S7V.json Status code: 200 Downloading: https://familysearch.org/platform/tree/couple-relationships/MD5G-F9Z.json Status code: 200 Downloading: https://familysearch.org/platform/tree/persons/L78Y-JDT.json Status code: 200 Downloading: https://familysearch.org/platform/tree/couple-relationships/MD5P-SPY.json Status code: 200 Downloading: https://familysearch.org/platform/tree/persons/LZRS-PMK/children.json Status code: 200 Downloading: https://familysearch.org/platform/tree/persons/L44H-YX3.json Status code: 200 Downloading: https://familysearch.org/platform/tree/persons/L4SK-KV5.json Status code: 200 Downloading: https://familysearch.org/platform/tree/persons/KHPM-BXK.json Status code: 200 Downloading: https://familysearch.org/platform/tree/persons/LJTJ-R8J.json Status code: 200 Downloading: https://familysearch.org/platform/tree/persons/M1B3-M1F.json Status code: 200

tmathie commented 8 years ago

./getmyancestors.py -a 99 -d 0 -u uname -p pword -i l44h-yx3 -o JohnMasters.ged -l JohnMasters99.log -v

tmathie commented 8 years ago

Sorry, i replied to the wrong thread :(

freeseek commented 8 years ago

Your run fails at the line "for y in x['names'][0]['nameForms'][0]['parts']" which is invoked when a new individual is created, so not after all of the downloads are complete. However, it is impossible for me to understand what individual the script was processing without the last lines of the log. Can you provide these when you encounter this error? In the meanwhile I have updated the code and it will now re-login in case the session gets closed by the server.

tmathie commented 8 years ago

Hello, Here are the last lines of the first search i got the error on: Downloading: https://familysearch.org/platform/tree/persons/9HYH-BNH.json Status code: 200 Downloading: https://familysearch.org/platform/tree/persons/L6CQ-PXG.json Status code: 200 Downloading: https://familysearch.org/platform/tree/persons/L4S5-GJ7.json Status code: 200 Downloading: https://familysearch.org/platform/tree/persons/L71L-PLT.json Status code: 200 Downloading: https://familysearch.org/platform/tree/persons/L6CQ-P8B.json Status code: 200 Downloading: https://familysearch.org/platform/tree/persons/9W7P-WJ2.json Status code: 200 Downloading: https://familysearch.org/platform/tree/persons/9C4M-5RS.json Status code: 200 Downloading: https://familysearch.org/platform/tree/persons/K4DV-5GS.json Status code: 200 Downloading: https://familysearch.org/platform/tree/persons/9SYB-6LP.json Status code: 200 Downloading: https://familysearch.org/platform/tree/persons/273J-Y66.json Status code: 200

Here are the last few lines of the second search i received the error on: Downloading: https://familysearch.org/platform/tree/persons/LZN3-FCN.json Status code: 200 Downloading: https://familysearch.org/platform/tree/persons/LZJ7-W59/children.json Status code: 200 Downloading: https://familysearch.org/platform/tree/persons/K4DV-5GS.json Status code: 200 Downloading: https://familysearch.org/platform/tree/persons/9SYB-6LP.json Status code: 200 Downloading: https://familysearch.org/platform/tree/persons/M11P-WPY.json Status code: 200 Downloading: https://familysearch.org/platform/tree/persons/9HYH-BNH.json Status code: 200 Downloading: https://familysearch.org/platform/tree/persons/9C4M-5RS.json Status code: 200 Downloading: https://familysearch.org/platform/tree/persons/9Z4W-WYZ.json Status code: 200 Downloading: https://familysearch.org/platform/tree/persons/273J-Y66.json Status code: 200 Thank you

freeseek commented 8 years ago

Yes, the problem is with individual 273J-Y66. I will get the script fixed ASAP.

freeseek commented 8 years ago

Fixed now. If you re-download the script, individual 273J-Y66 will not give you problems anymore.

tmathie commented 8 years ago

Thank you!