csingley / ofxtools

Python OFX Library
Other
301 stars 68 forks source link

Is Vanguard failing for everyone? #94

Closed rkhwaja closed 3 years ago

rkhwaja commented 3 years ago

Does anyone have Vanguard working with this package? I get a 500 error. I get an error page returned from ofxclient

I tried the --scan option without success.

csingley commented 3 years ago

Vanguard seems to have changed their server. Perfectly valid OFX profile requests are handled with errors, while practically identical PROFRQ sent to the same URL by Quicken succeed.

It's probably filtering at the HTTP level, User-Agent header discrimination or somesuch.

aclindsa commented 3 years ago

@csingley Are you able to capture the HTTP headers sent by Quicken?

csingley commented 3 years ago

@aclindsa nope.

aclindsa commented 3 years ago

Depends on how deep you want to go, but I've used wireshark and/or mitmproxy to collect raw traffic before when I needed to figure out what another piece of closed-source software was doing. I suspect it would be helpful in trying to match Quicken's behavior.

csingley commented 3 years ago

An ofxtools user told me he was able to log all this info with mitmproxy. If anybody who had a Quicken install wanted to hook this up, it would probably be most informative. Beyond the HTTP headers, we could potentially refresh the entire database of known OFX endpoints; the one made available through Microsoft Money's API is getting pretty stale by now.

aclindsa commented 3 years ago

I did some research into this tonight.

I have had logic in ofxgo to attempt each statement request twice at Vanguard, because they seem to like to have cookies set (which they thankfully set on the response to the first request). Previously the first request would appear to succeed with HTTP status code 200, but it would have content length 0. Now it appears that the first request of the pair has started failing with status code 500, but still has the necessary cookies set. If I change my code to accept the 500 error, and use the set cookies for the next request, I am able to successfully download my statement from Vanguard.

I see another piece of software is first requesting the profile from https://vesnc.vanguard.com/us/OfxProfileServlet before making the statement request to https://vesnc.vanguard.com/us/OfxDirectConnectServlet. I do not see any HTTP errors in their flow, so it may be that Vanguard's server is expecting the cookies to get set in the profile lookup each time through. This doesn't square with @csingley 's earlier comment, though:

Perfectly valid OFX profile requests are handled with errors, while practically identical PROFRQ sent to the same URL by Quicken succeed.

aclindsa commented 3 years ago

FWIW, this software is using (though this didn't make any difference in my testing):

User-Agent:       Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/11.1 Safari/605.1.15
csingley commented 3 years ago

@aclindsa thanks a lot! It's very encouraging that you can get ofxgo to download from Vanguard successfully. Cookies are just the kind of transport-layer discrepancy I've been looking for.

I see another piece of software is first requesting the profile from https://vesnc.vanguard.com/us/OfxProfileServlet before making the statement request to https://vesnc.vanguard.com/us/OfxDirectConnectServlet. I do not see any HTTP errors in their flow, so it may be that Vanguard's server is expecting the cookies to get set in the profile lookup each time through. This doesn't square with @csingley 's earlier comment, though

"Other software" generally uses a multi-step process; I think they always first do a PROFRQ before issuing a STMTRQ. This is the first time I've seem a different URL to serve PROFRS than which to serve STMTRS.

ofxtools as presently constructed assumes a single-step process with a single URL, and doesn't consider or set cookies at any point.

I will try to test out sending the PROFRQ to their OfxProfileServlet, extracting the (presumed) returned cookies, and stapling them to a STMTRQ sent to their OfxDirectConnectServlet.

csingley commented 3 years ago

Incidentally, deviating from the OFX spec to add HTTP cookies is kind of a dick move by Vanguard.

aclindsa commented 3 years ago

This is the first time I've seem a different URL to serve PROFRS than which to serve STMTRS.

Ah, yes, using the 'statement' URL would explain why you were getting the error with PROFRQ while I didn't see one!

I will try to test out sending the PROFRQ to their OfxProfileServlet, extracting the (presumed) returned cookies, and stapling them to a STMTRQ sent to their OfxDirectConnectServlet.

I like this approach. It would be nice to be able to do this for all requests (not just Vanguard), though I'm a little scared that many FIs won't properly implement the profile request in order to be able to rely on it universally...

csingley commented 3 years ago

I like this approach. It would be nice to be able to do this for all requests (not just Vanguard), though I'm a little scared that many FIs won't properly implement the profile request in order to be able to rely on it universally...

Well, I think you should to be able to rely on it. This is really a core part of the OFX spec all the way back... it's an abstraction layer that allows the FIs to manage their own OFX domains, only needing to "hardcode" with Quicken/Money a well-known URL to ping with PROFRQ. The client is then supposed to read the PROFRS, and follow those directions for further service... this is the very purpose of all those *MSGSET aggregates in the spec. I've just always blown all that off, because almost all FIs have opted for the degenerate case, so I've been able to get away with the simplifying assumption until now without penalty. Plus, y'know, in my own mind ofxtools is really all about the parser; direct data download is nice, but not at all critical to my own workflows. I just kind of cobbled together ofxget to help people out; I guess it's not too surprising that this little utility is disproportionately important to the user base, who mostly use libofx or another parser.

I guess the Right Way is simply to go ahead and implement this part of the spec... but this is a fair changeset, swerving ofxget away from a quick&dirty utility to something that probably deserves a little better engineering... not sure how long it'll take me to complete. Luckily I already have ready & waiting a pretty full-featured OFX parser, which is really the hardest part here.

I suspect that this change in Vanguard's behavior may at root be caused by a new version of a commercial OFX server product from FiServ or somebody like them... in which case we may see more of this kind of thing, including the dickish behavior with the HTTP cookies... note how Vanguard's PROFRS features the proprietary (and useless) INTU.BROKERID tag, indicating a close partnership with HIG Capital... who, it may be noted, shortly after acquiring Quicken removed from their website the Quicken implementation document I link to in the ofxtools docs, presumably because they no longer wish to encourage random 3rd parties to develop servers for Quicken, instead favoring the strategy of striking up confidential licensing deals with favored server partners (presumably first & foremost FiServ... the revenue sharing arrangements are easy to imagine).

aclindsa commented 3 years ago

I suspect that this change in Vanguard's behavior may at root be caused by a new version of a commercial OFX server product from FiServ or somebody like them...

I'm not sure how new the cookie behavior is. I looked and I've had the code to repeat cookies back to Vanguard in ofxgo since September, 2017. The new behavior from my perspective is that the initial request is greeted with a 500 HTTP response code instead of 200.

csingley commented 3 years ago

I'm not sure how new the cookie behavior is. I looked and I've had the code to repeat cookies back to Vanguard in ofxgo since September, 2017. The new behavior from my perspective is that the initial request is greeted with a 500 HTTP response code instead of 200.

What put you onto the cookies? I was able to download Vanguard statements in late 2019 without even being aware of them, or the separate URL.

aclindsa commented 3 years ago

I don't remember the exact source, but have found two other sources around the same time: https://github.com/captin411/ofxclient/pull/47 http://www.ofxhome.com/ofxforum/viewtopic.php?pid=108498#p108498

There's some guy with a username csingley that commented later on that last one :P

It really is curious that you were able to successfully fetch Vanguard statements until now without cookies. I wonder if the cookie requirement is (or was) predicated upon some other secondary condition that hasn't been identified.

csingley commented 3 years ago

Thanks for the link to ofxclient. We should all get together and form some sort of club or something. Much wheel reinvention!

There's some guy with a username csingley that commented later on that last one :P

I've run across that guy before; I'd be very wary of blindly accepting any of his wild assertions at face value.

I dunno man, it's odd, but not worth solving at this point. I just want to demonstrate connectivity to Vanguard, then take it from there.

csingley commented 3 years ago

OK, this is a quick & dirty script that successfully downloads a statement from Vanguard for a single user account. It demonstrates the necessary workflow.

Thanks again to @aclindsa for setting me straight here.

#!/usr/bin/env python3
import datetime
import urllib.request as urllib_request
import ssl
import socket
from io import BytesIO
import getpass

from ofxtools.utils import UTC
from ofxtools.Client import (
    OFXClient,
    OFX,
    PROFMSGSRQV1, PROFTRNRQ, PROFRQ,
    AUTH_PLACEHOLDER,
)
from ofxtools.Parser import OFXTree
from ofxtools.models import (
    INVSTMTMSGSET, INVSTMTMSGSRQV1, INVSTMTTRNRQ, INVSTMTRQ, INVACCTFROM,
    INCTRAN, INCPOS,
)

PROFRQ_URL = "https://vesnc.vanguard.com/us/OfxProfileServlet"

CLIENT_CONFIG = {
    "org": "Vanguard",
    "fid": "15103",
    "brokerid": "vanguard.com",
    "version": 102,
    "prettyprint": True,
    "close_elements": False,
}

def parse_profile():
    profclient = OFXClient(url=PROFRQ_URL, **CLIENT_CONFIG)

    dtprofup = datetime.datetime(1990, 1, 1, tzinfo=UTC)
    profrq = PROFRQ(clientrouting="NONE", dtprofup=dtprofup)
    proftrnrq = PROFTRNRQ(trnuid=profclient.uuid, profrq=profrq)

    user = password = AUTH_PLACEHOLDER
    signon = profclient.signon(password, userid=user)

    ofx = OFX(signonmsgsrqv1=signon, profmsgsrqv1=PROFMSGSRQV1(proftrnrq))
    request = profclient.serialize(ofx)

    req = urllib_request.Request(
        profclient.url, method="POST", data=request, headers=profclient.http_headers
    )
    ssl_context = ssl.create_default_context()
    timeout = socket._GLOBAL_DEFAULT_TIMEOUT
    response = urllib_request.urlopen(req, timeout=timeout, context=ssl_context)
    cookie = response.getheader("Set-Cookie").split(",")[0]

    parser = OFXTree()
    parser.parse(BytesIO(response.read()))
    ofx = parser.convert()

    msgsetlist = ofx.profmsgsrsv1[0].profrs.msgsetlist
    invstmtmsgsets = [msgset for msgset in msgsetlist if type(msgset) is INVSTMTMSGSET]
    assert len(invstmtmsgsets) == 1
    invstmtmsgset = invstmtmsgsets.pop()
    return invstmtmsgset.url, cookie

def request_stmt(url, cookie, userid, acctid, passwd):
    stmtclient = OFXClient(url=url, userid=userid, **CLIENT_CONFIG)

    msgsrq = INVSTMTMSGSRQV1(
        INVSTMTTRNRQ(
            trnuid=stmtclient.uuid,
            invstmtrq=INVSTMTRQ(
                invacctfrom=INVACCTFROM(brokerid=stmtclient.brokerid, acctid=acctid),
                inctran=INCTRAN(include=True),
                incoo=False,
                incpos=INCPOS(include=True),
                incbal=True,
            ),
        )
    )

    ofx = OFX(signonmsgsrqv1=stmtclient.signon(passwd), invstmtmsgsrqv1=msgsrq)
    request = stmtclient.serialize(ofx)
    # Staple on cookie from PROFRS
    headers = stmtclient.http_headers
    headers["Cookie"] = cookie

    req = urllib_request.Request(
        stmtclient.url, method="POST", data=request, headers=headers
    )
    ssl_context = ssl.create_default_context()
    timeout = socket._GLOBAL_DEFAULT_TIMEOUT
    response = urllib_request.urlopen(req, timeout=timeout, context=ssl_context)
    return response.read()

def main():
    userid = input("userid: ")
    acctid = input("acctid: ")
    passwd = getpass.getpass("password: ")
    url, cookie = parse_profile()
    response = request_stmt(url, cookie, userid, acctid, passwd)
    print(response)

if __name__ == "__main__":
    main()
aclindsa commented 3 years ago

Awesome - glad we got something going!

rianhunter commented 3 years ago

This will be fixed by #103

rianhunter commented 3 years ago

I just looked into this and for long-term reference the required missing cookie is "HNWPRD=A21" but in general it seems that OFX gateways expect Quicken to preserve any variety of cookies set in the profile request to be regurgitated in subsequent requests.

csingley commented 3 years ago

This issue should be fixed by release 0.9

boulos commented 3 years ago

I notice that request_statements in Client has the "first go do PROFRS" via _get_service_urls mentioned here to unblock Vanguard. My current calling code though does a request_accounts followed by a request_statements and request_accounts has the same 5xx error (naturally).

So similarly, ofxget acctinfo doesn't work. There isn't (currently) a way for me to pass url down through request_accounts, but should request_accounts instead have the same internal "sigh, go get the other url to get myself a cookie".

csingley commented 3 years ago

should request_accounts instead have the same internal "sigh, go get the other url to get myself a cookie".

Yep. Try out c7d6e74.

boulos commented 3 years ago

Woohoo! Some export PYTHONPATH and adding timeout=30.0 to my Vanguard calls (the new 2 second default for timeout in Client.py's download from bdf8028 tripped me up).

Thanks for the quick turnaround!

csingley commented 3 years ago

(the new 2 second default for timeout in Client.py's download from bdf8028 tripped me up).

I'm open to revisiting the default value. Keeping the socket global default timeout proved problematic to integrate with the new CookieJar code.

Possibly the default value should be timeout=0.0. AFAICT it really shouldn't be timeout=None, but this whole situation continues to be confusing and irritating in Python 3.

aclindsa commented 3 years ago

I don't know that I personally care either way since I'm already overriding it for my own purposes, but as an additional datapoint I've found Prudential is also problematic at timeout=2.0. Though it sometimes takes a full 20 seconds for them to respond, so perhaps they're such an outlier they shouldn't be considered when thinking about the default value.

boulos commented 3 years ago

@csingley any chance you'd be willing to do a new release with these fixes (and your more recent other improvements)? I'd like to make sure I have a hermetic "just use pip / requirements.txt" for my testing, if possible (rather than adding ofxtools to PYTHONPATH).

csingley commented 3 years ago

Well ofxtools has no dependencies, so it's easy to run current with user installs with only modest contortions of your requirements.txt... and of course the cool kids are all running editable installs anyway... but you know your pleasure is our business, so here you go chief.

rianhunter commented 3 years ago

Yes, venv and pip install --editable are your friends

boulos commented 3 years ago

Thanks, @csingley! I'm fine with my local checkout, but for putting in a requirements.txt and having some sort of test run / tell someone else to use try it, it's a lot easier to say "you need ofxtools 0.9.1 or greater". I guess I can say "just use editable and point to HEAD" (or a specific sha).

Either way, thanks again!