UChicago-Coase-Sandor / pacer_lib

http://pacer-lib.readthedocs.org/
9 stars 11 forks source link

docket_parser bugs #1

Closed zhangchuck closed 10 years ago

zhangchuck commented 10 years ago

Files in this directory are dockets that are not being properly processed by pacer_lib.reader Attempting to process these dockets generates:

Traceback (most recent call last):
  File "parse_docket_0220.py", line 6, in <module>
    a.parse_data("cacdce_2_07-cv-02544")
  File "/usr/lib/python2.7/site-packages/pacer_lib/reader.py", line 89, in parse_data
    docket_entries = docket_table.find_all('tr')
AttributeError: 'NoneType' object has no attribute 'find_all'

BeautifulSoup seems to not be reading the .html dockets correctly, resulting in the above error.

Meta extraction error for "candce_5_06-cv-06486, candce_5_08-cv-00832, nysdce_1_04-cv-07447, nysdce_1_04-cv-09973" returns the following Traceback (or a version thereof):

$ python parse_docket_0220.py
Traceback (most recent call last):
  File "parse_docket_0220.py", line 8, in <module>
    a.parse_dir(False)
  File "/usr/lib/python2.7/site-packages/pacer_lib/reader.py", line 576, in parse_dir
    download_meta, case_meta = self.extract_all_meta(source)
  File "/usr/lib/python2.7/site-packages/pacer_lib/reader.py", line 511, in extract_all_meta
    download_meta = self.extract_download_meta(data)
  File "/usr/lib/python2.7/site-packages/pacer_lib/reader.py", line 192, in extract_download_meta
    temp = eval(r)
  File "<string>", line 1
    ["5:06-cv-06486","candce",""In re Network Appliance Derivative Litigation"","850","10/17/2006","06/25/2007","https://ecf.cand.uscourts.gov/cgi-bin/iqquerymenu.pl?185403"]

SyntaxError: invalid syntax

[10:31:00 AM] temp = eval(r) [10:31:16 AM] Charles Zhang: evaluates r as a python expression [10:31:26 AM] Charles Zhang: string --> code [10:31:46 AM] Charles Zhang: the string should create a list [10:31:50 AM] Charles Zhang: but you get this: ""In re Network Appliance Derivative Litigation"" [10:32:01 AM] Charles Zhang: which has unescaped quotation marks [10:32:05 AM] Charles Zhang: so it sees the ""

Meta extraction error for "miwdce_1_95-cv-00141 returns the following Traceback:

Traceback (most recent call last):
  File "parse_docket_0220.py", line 8, in <module>
    a.parse_dir(False)
  File "/usr/lib/python2.7/site-packages/pacer_lib/reader.py", line 576, in parse_dir
    download_meta, case_meta = self.extract_all_meta(source)
  File "/usr/lib/python2.7/site-packages/pacer_lib/reader.py", line 519, in extract_all_meta
    lawyer_meta = self.extract_lawyer_meta(data)
  File "/usr/lib/python2.7/site-packages/pacer_lib/reader.py", line 296, in extract_lawyer_meta
    plaintiff_cells = plaintiff_row.find_all('td', {'width':'40%'})
AttributeError: 'str' object has no attribute 'find_all'