UChicago-Coase-Sandor / pacer_lib

http://pacer-lib.readthedocs.org/
9 stars 11 forks source link

Login System Changed #14

Closed zhangchuck closed 9 years ago

zhangchuck commented 9 years ago

@synsypa

From KW: "I believe PACER changed their system in August, and the login in the scraper file no longer worked for me, perhaps due to the change. It does work when I use a different payload/url in refresh_login, however:

    payload = {'login':'login', 'login:loginName':self.username, 'login:password':self.password, 'login:clientCode':'', 'login:j_idt144':'', 'javax.faces.ViewState':'stateless'}
    login_url = 'https://pacer.login.uscourts.gov/csologin/login.jsf'

"

zhangchuck commented 9 years ago

From KW: "The issue was that when I originally revised the scraper login, I looked at the data being sent to the server from my browser (using Chrome's developer tools) and set the payload in the scraper accordingly. The odd thing is that the data being sent changes periodically. In particularly, there is a component called 'login:jidt*' set to '', where _* is a number that varies. Sometimes it is 'login:j_idt145', but sometimes it is 'login:j_idt155', 'login:j_idt185', etc. I still don't know what causes these components to change, but if I look up the payload on a browser first and then update the component, the login works."

zhangchuck commented 9 years ago

This error may be from the new templating system being used by PACER. We probably have to find the correct j_idt**\ on the login page (it should be found in the form tag) as we can't force the j_idt to not auto-generate.

http://h30499.www3.hp.com/t5/LoadRunner-Support-Forum/Issues-with-dynamically-generated-j-idt-id-s-when-testing-a-jsf/td-p/6355445

https://stackoverflow.com/questions/13697312/how-to-get-rid-of-the-auto-generated-j-idt-id-in-composite-component

https://en.wikipedia.org/wiki/JavaServer_Faces

zhangchuck commented 9 years ago

Relevant code for finding the username

<td class="ecol1"><label id="login:j_idt169" class="ui-outputlabel ui-widget fbold" for="login:loginName">Username<span class="ui-outputlabel-rfi">*</span></label></td>
<td class="ecol2"><input id="login:loginName" name="login:loginName" type="text" autocomplete="off" size="40" class="ui-inputfield ui-inputtext ui-widget ui-state-default ui-corner-all" /><script id="login:loginName_s" type="text/javascript">PrimeFaces.cw('InputText','widget_login_loginName',{id:'login:loginName'});</script></td>
<td class="ecol3"><div id="login:j_idt170" aria-live="polite" class="ui-message"></div></td>

As you can see, we can probably identify the proper jdt by searching for a "label" that has a "for" attribute == "login:loginName". Then, once we have this label, we use the "id" attribute of this label as the actual payload.

zhangchuck commented 9 years ago

Pseudo-code (soup is the BeautifulSoup parsed version of the login page):

temp = soup.find('label', attrs={"for" : "login:loginName"})
username_idt = temp['id']

temp = soup.find('label', attrs={"for" : "login:password"})
login_idt = temp['id']

(We should probably also implement for client code)