Closed esonderegger closed 4 years ago
Another idea could be value extraction with literal_eval:
Safely evaluate an expression node or a string containing a Python literal or container display. The string or node provided may only consist of the following Python literal structures: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, and None.
This can be used for safely evaluating strings containing Python values from untrusted sources without the need to parse the values oneself. It is not capable of evaluating arbitrarily complex expressions, for example involving operators or indexing.
from ast import literal_eval
tests = ["-13.2", "15.4", "8", "9.0", "10.", "8.22"]
for test in tests:
val = literal_eval(test)
print('-----')
print(val)
print(type(val))
-----
-13.2
<class 'float'>
-----
15.4
<class 'float'>
-----
8
<class 'int'>
-----
9.0
<class 'float'>
-----
10.0
<class 'float'>
-----
8.22
<class 'float'>
Is there an example file that produces this error? It looks this commit attempts to implement the solution you mentioned?
I'm sorry that it's taken me so long to reply to you about this! I believe this was the filing that caused me to write the ticket: https://docquery.fec.gov/dcdev/posted/1157513.fec
Thank you for the tip about literal_eval
! Unfortunately in this case I don't think it would help much because our initial value is a string that is almost never quoted, but every now and then is.
So in this case, the test would be more like:
tests = ['123.45', '"56.78"', '"HDR"', 'HDR']
-----
123.45
<class 'float'>
-----
56.78
<class 'str'>
-----
HDR
<class 'str'>
Traceback (most recent call last):
...
ValueError: malformed node or string: <_ast.Name object at ...>
And you are right - it looks like I resolved this with the commit you linked to, so I should have closed it at the time. Sorry for the confusion!
In some filings, fields are enclosed in quotation marks even though they don't need to be. That means the parser sees values like
"4247.66"
and says "that doesn't look like a number to me".I think if a value that is supposed to be numeric begins and ends with
"
after we callstrip()
on it, then we should try again withvalue[1:-1]