Closed arderyp closed 8 years ago
testing with slate. Gets through the broken PDFs, but seems to affect the urls string parsing/glueing. Tests are failing.
The slate method seems to pull text differently, so will have to re-adjust the splitting/gluing rules. That being said, this method is getting through all PDFs without failing, and picked up 2 extra citations to boot. There is a new unicode error to look into though:
scotuswebcites.io/citations/models.py:64: UnicodeWarning: Unicode unequal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
tests are passing now. More on slate here: https://github.com/timClicks/slate
example: FERC v. Electric Power Supply Assn. [REVISION]
Delete it from the database and run discovery to see the error.