Open newsjunkie247 opened 10 years ago
Hi Miranda,
Please show the command you are using.
For example, if you've cloned the project. using: git clone https://github.com/BetaNYC/budgetBuddy.git
Then you would parse the all.txt file that is already in the project:
./parse.py all.txt
will output the CSV file
then the next step is to convert CSV to sqlite3
If you are trying to run parse.py against another file, then compare it to all.txt and let us know the differences.
Ralph
I didn't clone it, just downloaded the zip. I ran the the parser.py on all.txt in the command line and it worked. I thought it would run the same on a .txt version of the adopted budget from both 2013 and 2014 (from here http://www.nyc.gov/html/omb/html/publications/reports.shtml?Supporting%20Schedule). I converted those with pdftotext the same way Chris discussed in the BetaNYC e-mails. Then I ran python parse.py ss6_14.txt (and also 2013) and then got the error message above. I'm interested in working with the adoped budgets for a final assignment for an introductory python/dataanalysis class I'm taking at Columbia, at the moment I'm just interested in having them in .csv format to try some kind of basic python data analysis (maybe comparing this year's and last year's) in some kind of basic way.
Also, with the 2014 one, the PDF was "secured" so I had to "crack" it first with a web service before converting it to text, but I got the same error message with the 2013 adopted budget.
Hi Miranda, and also John,
I modified the parser to handle the case you are reporting.
John, it appears that the traceback message contain unicode that the sys.stderr.write did not like, so I ignore that error in the error case :)
Hope that is okay.
Ralph
On Sun, Jun 29, 2014 at 12:52 AM, newsjunkie247 notifications@github.com wrote:
I didn't clone it, just downloaded the zip. I ran the the parser.py on all.txt in the command line and it worked. I thought it would run the same on a .txt version of the adopted budget from both 2013 and 2014 (from here http://www.nyc.gov/html/omb/html/publications/reports.shtml?Supporting%20Schedule). I converted those with pdftotext the same way Chris discussed in the BetaNYC e-mails. Then I ran python parse.py ss6_14.txt (and also 2013) and then got the error message above. I'm interested in working with the adoped budgets for a final assignment for an introductory python/dataanalysis class I'm taking at Columbia, at the moment I'm just interested in having them in .csv format to try some kind of basic python data analysis (maybe comparing this year's and last year's) in some kind of basic way.
— Reply to this email directly or view it on GitHub https://github.com/BetaNYC/budgetBuddy/issues/7#issuecomment-47445877.
Miranda, the best thing to do is clone the repository and then do a git pull when there are updates.
Otherwise, you'd have to download the zip file to get this update.
Ralph
On Sun, Jun 29, 2014 at 2:08 AM, Ralph Yozzo ralph@brooklynmarathon.com wrote:
Hi Miranda, and also John,
I modified the parser to handle the case you are reporting.
John, it appears that the traceback message contain unicode that the sys.stderr.write did not like, so I ignore that error in the error case :)
Hope that is okay.
Ralph
On Sun, Jun 29, 2014 at 12:52 AM, newsjunkie247 notifications@github.com wrote:
I didn't clone it, just downloaded the zip. I ran the the parser.py on all.txt in the command line and it worked. I thought it would run the same on a .txt version of the adopted budget from both 2013 and 2014 (from here http://www.nyc.gov/html/omb/html/publications/reports.shtml?Supporting%20Schedule). I converted those with pdftotext the same way Chris discussed in the BetaNYC e-mails. Then I ran python parse.py ss6_14.txt (and also 2013) and then got the error message above. I'm interested in working with the adoped budgets for a final assignment for an introductory python/dataanalysis class I'm taking at Columbia, at the moment I'm just interested in having them in .csv format to try some kind of basic python data analysis (maybe comparing this year's and last year's) in some kind of basic way.
— Reply to this email directly or view it on GitHub https://github.com/BetaNYC/budgetBuddy/issues/7#issuecomment-47445877.
Also you will see a lot of error messages to stderr. I left them in there so John can look at them.
On Sun, Jun 29, 2014 at 2:10 AM, Ralph Yozzo ralph@brooklynmarathon.com wrote:
Miranda, the best thing to do is clone the repository and then do a git pull when there are updates.
Otherwise, you'd have to download the zip file to get this update.
Ralph
On Sun, Jun 29, 2014 at 2:08 AM, Ralph Yozzo ralph@brooklynmarathon.com wrote:
Hi Miranda, and also John,
I modified the parser to handle the case you are reporting.
John, it appears that the traceback message contain unicode that the sys.stderr.write did not like, so I ignore that error in the error case :)
Hope that is okay.
Ralph
On Sun, Jun 29, 2014 at 12:52 AM, newsjunkie247 <notifications@github.com
wrote:
I didn't clone it, just downloaded the zip. I ran the the parser.py on all.txt in the command line and it worked. I thought it would run the same on a .txt version of the adopted budget from both 2013 and 2014 (from here http://www.nyc.gov/html/omb/html/publications/reports.shtml?Supporting%20Schedule). I converted those with pdftotext the same way Chris discussed in the BetaNYC e-mails. Then I ran python parse.py ss6_14.txt (and also 2013) and then got the error message above. I'm interested in working with the adoped budgets for a final assignment for an introductory python/dataanalysis class I'm taking at Columbia, at the moment I'm just interested in having them in .csv format to try some kind of basic python data analysis (maybe comparing this year's and last year's) in some kind of basic way.
— Reply to this email directly or view it on GitHub https://github.com/BetaNYC/budgetBuddy/issues/7#issuecomment-47445877.
Thanks! Still very much a newbie so will probably will just download the zip - once I have those two files at the moment I'm probably just going to try working with them in our class iPython notebooks not sure yet how adventurous I'll get with the other aspects of this, but we'll see!
I tried the scraper on converted to text versions of both this year's and last year's adopted budgets , and in each case got the error:Traceback (most recent call last): File "parse.py", line 228, in scrape 'file_name': f.name File "parse.py", line 188, in parse data = self.line2dict(line) File "parse.py", line 136, in line2dict value = self.process_value(value) File "parse.py", line 37, in process_value return Decimal(v.replace(',', '')) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/decimal.py", line 548, in new "Invalid literal for Decimal: %r" % value) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/decimal.py", line 3872, in _raise_error raise error(explanation) InvalidOperation: Invalid literal for Decimal: '-'
Traceback (most recent call last): File "parse.py", line 236, in
scrape(open(sys.argv[1], 'r'))
File "parse.py", line 232, in scrape
sys.stderr.write(e + u'\n')
TypeError: unsupported operand type(s) for +: 'InvalidOperation' and 'unicode'
Any thoughts? -- Miranda N.