funginstitute / patentprocessor

BSD 2-Clause "Simplified" License
68 stars 31 forks source link

Patent application parsing #69

Closed billyeh closed 10 years ago

gtfierro commented 10 years ago

I'm having trouble getting the script to parse applications. The following incantation of parse.py seems to start parsing the grant file I have in my directory:

python parse.py -d application

Also, when I specify the file using python parse.py -d application -x ipa130124.xml, I get the following error:

sqlalchemy.exc.OperationalError: (OperationalError) no such column: application.granted u'SELECT application.id AS application_id, application.type AS application_type, application.number AS application_number, application.country AS application_country, application.date AS application_date, application.granted AS application_granted, application.num_claims AS application_num_claims \nFROM application \nWHERE application.id = ?' (u'2013/20130019365',)

Can you look into these?

Thanks!

billyeh commented 10 years ago

Thanks for helping me test the code out. For the first bug, I missed the fact that we should change the default regex depending on doctype, so I've added that in the latest commit. As for the other bug, here's what I did:

But it runs fine on mine. So I'd like to be able to help, but it's hard to me to tell what's wrong without being able to reproduce it. Maybe there's some of the latest code I don't have that involves application.granted.

Edit: Pulled an update from funginstitute's master branch and re-ran with the same results, so I don't think anything in there can be causing it.

Edit 2: I got a fresh copy of the repository, and the error does occur on this one. Interesting. I'll look into it.

billyeh commented 10 years ago

All right, after finding the same bug in a brand-new cloned repo from the fork, I found that running make spotless fixes it. Not sure why, but maybe you could try that out? In the case that it fixes the problem, I can't really see any candidates for messing the code up besides maybe accidentally-committed .pyc, .db, or .sqlite3 files.

gtfierro commented 10 years ago

Thanks for your work, Bill! Running make spotless fixes the error for me as well.

I'm still running into a problem if I have both IPG and IPA files in the same directory. If I run python parse.py -d application, it still attempts to parse the IPG file.

gtfierro commented 10 years ago

Wait no nevermind no it doesn't. Sorry.

gtfierro commented 10 years ago

I'm going to wait until the database is populated and the bridge server isn't as busy, then I'll work to merge this pull request. I'll let you know of any more bugs I come across, but so far everything looks great!

gtfierro commented 10 years ago

Github is telling me I can't automatically merge this pull request. Can you fetch the changes from upstream to your local repository, make all the merges, and then re-issue this pull request?

billyeh commented 10 years ago

Sure. I can do that pretty soon.