BetaNYC / budgetBuddy

An API for more intuitive access to NYC Budget Data, current and historical
http://budgetbuddy.herokuapp.com
MIT License
11 stars 4 forks source link

Adopted Budget error #7

Open newsjunkie247 opened 10 years ago

newsjunkie247 commented 10 years ago

I tried the scraper on converted to text versions of both this year's and last year's adopted budgets , and in each case got the error:Traceback (most recent call last): File "parse.py", line 228, in scrape 'file_name': f.name File "parse.py", line 188, in parse data = self.line2dict(line) File "parse.py", line 136, in line2dict value = self.process_value(value) File "parse.py", line 37, in process_value return Decimal(v.replace(',', '')) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/decimal.py", line 548, in new "Invalid literal for Decimal: %r" % value) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/decimal.py", line 3872, in _raise_error raise error(explanation) InvalidOperation: Invalid literal for Decimal: '-'

Traceback (most recent call last): File "parse.py", line 236, in scrape(open(sys.argv[1], 'r')) File "parse.py", line 232, in scrape sys.stderr.write(e + u'\n') TypeError: unsupported operand type(s) for +: 'InvalidOperation' and 'unicode'

Any thoughts? -- Miranda N.

fedex1 commented 10 years ago

Hi Miranda,

Please show the command you are using.

For example, if you've cloned the project. using: git clone https://github.com/BetaNYC/budgetBuddy.git

Then you would parse the all.txt file that is already in the project:

./parse.py all.txt

will output the CSV file

then the next step is to convert CSV to sqlite3

If you are trying to run parse.py against another file, then compare it to all.txt and let us know the differences.

Ralph

newsjunkie247 commented 10 years ago

I didn't clone it, just downloaded the zip. I ran the the parser.py on all.txt in the command line and it worked. I thought it would run the same on a .txt version of the adopted budget from both 2013 and 2014 (from here http://www.nyc.gov/html/omb/html/publications/reports.shtml?Supporting%20Schedule). I converted those with pdftotext the same way Chris discussed in the BetaNYC e-mails. Then I ran python parse.py ss6_14.txt (and also 2013) and then got the error message above. I'm interested in working with the adoped budgets for a final assignment for an introductory python/dataanalysis class I'm taking at Columbia, at the moment I'm just interested in having them in .csv format to try some kind of basic python data analysis (maybe comparing this year's and last year's) in some kind of basic way.

newsjunkie247 commented 10 years ago

Also, with the 2014 one, the PDF was "secured" so I had to "crack" it first with a web service before converting it to text, but I got the same error message with the 2013 adopted budget.

fedex1 commented 10 years ago

Hi Miranda, and also John,

I modified the parser to handle the case you are reporting.

John, it appears that the traceback message contain unicode that the sys.stderr.write did not like, so I ignore that error in the error case :)

Hope that is okay.

Ralph

On Sun, Jun 29, 2014 at 12:52 AM, newsjunkie247 notifications@github.com wrote:

I didn't clone it, just downloaded the zip. I ran the the parser.py on all.txt in the command line and it worked. I thought it would run the same on a .txt version of the adopted budget from both 2013 and 2014 (from here http://www.nyc.gov/html/omb/html/publications/reports.shtml?Supporting%20Schedule). I converted those with pdftotext the same way Chris discussed in the BetaNYC e-mails. Then I ran python parse.py ss6_14.txt (and also 2013) and then got the error message above. I'm interested in working with the adoped budgets for a final assignment for an introductory python/dataanalysis class I'm taking at Columbia, at the moment I'm just interested in having them in .csv format to try some kind of basic python data analysis (maybe comparing this year's and last year's) in some kind of basic way.

— Reply to this email directly or view it on GitHub https://github.com/BetaNYC/budgetBuddy/issues/7#issuecomment-47445877.

fedex1 commented 10 years ago

Miranda, the best thing to do is clone the repository and then do a git pull when there are updates.

Otherwise, you'd have to download the zip file to get this update.

Ralph

On Sun, Jun 29, 2014 at 2:08 AM, Ralph Yozzo ralph@brooklynmarathon.com wrote:

Hi Miranda, and also John,

I modified the parser to handle the case you are reporting.

John, it appears that the traceback message contain unicode that the sys.stderr.write did not like, so I ignore that error in the error case :)

Hope that is okay.

Ralph

On Sun, Jun 29, 2014 at 12:52 AM, newsjunkie247 notifications@github.com wrote:

I didn't clone it, just downloaded the zip. I ran the the parser.py on all.txt in the command line and it worked. I thought it would run the same on a .txt version of the adopted budget from both 2013 and 2014 (from here http://www.nyc.gov/html/omb/html/publications/reports.shtml?Supporting%20Schedule). I converted those with pdftotext the same way Chris discussed in the BetaNYC e-mails. Then I ran python parse.py ss6_14.txt (and also 2013) and then got the error message above. I'm interested in working with the adoped budgets for a final assignment for an introductory python/dataanalysis class I'm taking at Columbia, at the moment I'm just interested in having them in .csv format to try some kind of basic python data analysis (maybe comparing this year's and last year's) in some kind of basic way.

— Reply to this email directly or view it on GitHub https://github.com/BetaNYC/budgetBuddy/issues/7#issuecomment-47445877.

fedex1 commented 10 years ago

Also you will see a lot of error messages to stderr. I left them in there so John can look at them.

On Sun, Jun 29, 2014 at 2:10 AM, Ralph Yozzo ralph@brooklynmarathon.com wrote:

Miranda, the best thing to do is clone the repository and then do a git pull when there are updates.

Otherwise, you'd have to download the zip file to get this update.

Ralph

On Sun, Jun 29, 2014 at 2:08 AM, Ralph Yozzo ralph@brooklynmarathon.com wrote:

Hi Miranda, and also John,

I modified the parser to handle the case you are reporting.

John, it appears that the traceback message contain unicode that the sys.stderr.write did not like, so I ignore that error in the error case :)

Hope that is okay.

Ralph

On Sun, Jun 29, 2014 at 12:52 AM, newsjunkie247 <notifications@github.com

wrote:

I didn't clone it, just downloaded the zip. I ran the the parser.py on all.txt in the command line and it worked. I thought it would run the same on a .txt version of the adopted budget from both 2013 and 2014 (from here http://www.nyc.gov/html/omb/html/publications/reports.shtml?Supporting%20Schedule). I converted those with pdftotext the same way Chris discussed in the BetaNYC e-mails. Then I ran python parse.py ss6_14.txt (and also 2013) and then got the error message above. I'm interested in working with the adoped budgets for a final assignment for an introductory python/dataanalysis class I'm taking at Columbia, at the moment I'm just interested in having them in .csv format to try some kind of basic python data analysis (maybe comparing this year's and last year's) in some kind of basic way.

— Reply to this email directly or view it on GitHub https://github.com/BetaNYC/budgetBuddy/issues/7#issuecomment-47445877.

newsjunkie247 commented 10 years ago

Thanks! Still very much a newbie so will probably will just download the zip - once I have those two files at the moment I'm probably just going to try working with them in our class iPython notebooks not sure yet how adventurous I'll get with the other aspects of this, but we'll see!