csingley / ofxtools

Python OFX Library
Other
298 stars 66 forks source link

Vanguard returning empty profile response #158

Open aclindsa opened 2 years ago

aclindsa commented 2 years ago

I've had Vanguard downloads working for years now, but they recently broke with ofxtools. Vanguard is returning an empty response (like, 0 bytes), hence https://github.com/csingley/ofxtools/issues/157. Interestingly, they still work with ofxgo. So what's different?

In ofxgo, I don't do the proper profile requests the way ofxtools does. However, Vanguard seems to require cookies be set when making the statement request, so I have a hack where if I detect the initial response is empty, I save the cookies it returned and try the exact same response over again. See here: https://github.com/aclindsa/ofxgo/blob/master/vanguard_client.go#L62

That said, I don't know what is different now vs. a few weeks ago with ofxtools. I suspect there's some creative new requirement which has been unintentionally imposed on us in order to receive the proper profile response, though I haven't been able to discover it (so far I've tried varying the clientuid and user agent with no success).

csingley commented 2 years ago

However, Vanguard seems to require cookies be set when making the statement request, so I have a hack where if I detect the initial response is empty, I save the cookies it returned and try the exact same response over again.

Interesting, might be doable in a backoff/retry to handle empty/infinite responses. Problem here is that none of the ofxtools cookie-handling code is mine. I am just an unfrozen cavemen, your web apps frighten and confuse me.

csingley commented 2 years ago

...but there's a decent chance I'm serious about the Go thing sometime in 2022! Don't say I didn't warn you, when you start getting pull requests written in Crayola yammering about cost basis matching, multi-leg security reorgs, and transforms from GAAP to cash basis accounting.

aclindsa commented 2 years ago

Interesting, might be doable in a backoff/retry to handle empty/infinite responses. Problem here is that none of the ofxtools cookie-handling code is mine. I am just an unfrozen cavemen, your web apps frighten and confuse me.

To be clear, I'm falling short of actually recommending we do this here (work as it may). Mostly because I'm hoping we ought to be able to get it working using the typical request/response dance. I can't imagine an OFX server intentionally requiring someone to re-try a request when they get an initial empty response, so it seems like something that may work correctly if we can only find the new magic combination of headers (or whatever).

And based on my previous experience, I don't think you need to do anything special with the cookie handling - just hang onto them from the initial request and send them with the subsequent response(s).

aclindsa commented 2 years ago

...but there's a decent chance I'm serious about the Go thing sometime in 2022! Don't say I didn't warn you, when you start getting pull requests written in Crayola yammering about cost basis matching, multi-leg security reorgs, and transforms from GAAP to cash basis accounting.

I'm ready!

aclindsa commented 2 years ago

I finally spent a little more time digging through the code and debugging this tonight... and it turns out that Vanguard seems to have fixed whatever issue caused it to be empty one time, but because there's some caching going on in Client.py/request_profile() (in the block if persistpath.exists():), I kept getting errors due to reading my bad cached version which was empty.

Does this mean we ought to be doing more sanity-checking before caching it, or perhaps ignoring the cached copy if attempting to use it causes an error? Not sure what the best way to handle that is...

csingley commented 2 years ago

Does this mean we ought to be doing more sanity-checking before caching it

The actual caching goes on near the end of that same function, when the persistpath is opened for writing... just after we have fully parsed the OFX response (which parsing does quite a bit of validation to ensure well-formed header/tag structure) and checked the status code & timestamp in the PROFRS. I really thought that would have sufficed; I'm a little confused how Vanguard's b0rked response managed to slip past this bit of logic.

ignoring the cached copy if attempting to use it causes an error?

I mean sure, we got along fine without caching profiles for many years before shoehorning this bit in. Then this still depends on raising an error in header.py rather than eternally eating lotus in a while True loop.

Also - are my eyes deceiving me, or did I define a boolean persist in this function's args, and then just never use it in the function body? We could also, you know, actually wire up this button, with semantics that it skips both reading and writing cache. At least expose better troubleshooting capabilities to the function caller instead of making them pop the hood & break out socket wrenches.

aclindsa commented 2 years ago

I really thought that would have sufficed; I'm a little confused how Vanguard's b0rked response managed to slip past this bit of logic.

Crap, you're right. I must've misread this code when looking at it earlier because I had it in my head that it was cached before being parsed.

In searching for a better explanation, I finally remembered something: A while back one of my partitions ran out of space due to something unrelated. I wonder if the disk filled up so it failed to write this file (even though the OFX had been properly formatted). And because this particular problem didn't show up until the next time I ran my code (and it took me much longer than that to make time to properly debug it), I missed this as a potential cause. Can you poke any holes in that explanation?

If not, maybe catching that error and deleting the file upon failure is about the only way to protect against it? Or, you know, I could just not fill up my hard disk...

Also - are my eyes deceiving me, or did I define a boolean persist in this function's args, and then just never use it in the function body?

I think you're right. I don't see any way persist is being used.

csingley commented 2 years ago

Can you poke any holes in that explanation?

I would agree to just about anything to avoid knowing more about file systems - but sure, sounds plausible, especially if a cron job was involved.

Here, try out 3a64f7f on your pathological data & see if it at least bombs out.

I'm gonna have a think about this persist arg in the function signature. It's not clear to me that there's a good argument for fetching an OFX profile, but not caching it. There's a much better argument for not fetching an OFX profile at all, and there's also an argument for deleting a cached profile if it errors out

aclindsa commented 2 years ago

Here, try out 3a64f7f on your pathological data & see if it at least bombs out.

I have confirmed that this commit (well, technically the latest master of https://github.com/csingley/ofxtools/commit/0c4e5e3719e339f1f693fe6ce49151167b3e252d) causes the query to error out quickly when I copy the bad profile back in, and allows it to succeed otherwise. Thanks!

genterminl commented 1 year ago

Not sure if this belongs here or in a new issue, but "ofxget scan vanguard" says Scan found no working formats for https://vesnc.vanguard.com/us/OfxProfileServlet and "ofxget prof vanguard" (same with acctsinfo) blows up with "urllib.error.URLError: <urlopen error [SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:1006)>" with python 3.11. Let me know if a lower python should work, and if/where you want the full error output.

jfly commented 3 months ago

@genterminl see https://github.com/csingley/ofxtools/issues/139 for a discussion of the SSLV3_ALERT_HANDSHAKE_FAILURE issue, including a couple of workarounds (either downgrade to pre-python 3.10, or add this hack).