Closed ZzZombo closed 1 year ago
Can you clarify what you mean about Content-Type
handling? Redirect handling is likely a duplicate of #37, but would be nice to have for sure.
Your code only needs to have resp = sess.get(pac_url, timeout=timeout,
allow_redirects = True
)
in api.py
added in the code in order to allow redirects, but given there is now a response for each redirect and the final URL, you need to check each Content-Type
for validity, if at least one is acceptable, then the content type check must succeed for the whole redirection chain. And no, it's not a duplicate of the linked issue, it's for the PAC file itself, not for purely a HTTP redirecting handling.
Requests defaults allow_redirects
to True if not specified, so my understanding is that https://github.com/carsonyl/pypac/blob/master/pypac/api.py#L157 should already follow redirects, and the content type check only happens at the end.
I do not know what's the deal with that, I did check the docs myself beforehand, but adding that absolutely does solve the redirect issue for me, and I'm not really motivated to find out whether that's a case of outdated docs or a bug or something else. And I also do absolutely see now how the content type check is done only once, only for the final response. That's the part that needs to be changed for robustness sake if redirects are allowed.
Explicitly setting allow_redirects
to true is easy. But checking each redirection for PAC Content-Type
match does not sound correct: why would the server return PAC headers and body but also HTTP 301/302 and Location
header? Besides, that likely needs allow_redirects
to be false, with pypac manually handling redirects in order to implement such logic.
Because it's how just how it is in the wild. You seem to misunderstand the issue at hand, The Antizapret PAC file is first served with the appropriate for a PAC file content type in the redirect response (no body), then the actual, final URL is served with a content type of a regular JavaScript file and the body of the PAC file.
I also think it merits checking only the first and the last responses in the chain for content type.
I've never heard of the behaviour you describe. It sounds very strange. I think it's best if you keep using your workaround (I assume you have one). I don't want to make potentially major code changes for this, especially because I don't have a repro to test against.
OK, I'm gonna apologize for false report. The application didn't get a (valid, expected) response from the remote server. By the time I altered the function, the issue was resolved somehow, I think it's because we started to supply an user agent string that wasn't banned per the PAC script policy and made sure to abide to the rate limits.
pypac
doesn't handle redirects in PAC server response. Additionally, it should consider both the original URL's response and the target URL'sContent-Type
for validation. I've discovered certain servers serve them with different content types, so if at least one response'sContent-Type
satisfies the check, it should be considered valid.