Open sometimesabird opened 2 years ago
Sounds more like a bug that has been fixed.. you seem to be passing '.' in the strip argument.. that is supposed to strip the decimal points.
Oh, so it's mean to strip any of the characters, not this particular sequence?
Describe the bug
Decimal points are sometimes not read by the program despite being in the pdf text. I.e., it reads "1.5" as "15". The is a new bug, as version 0.7.2 was working correctly. The current version (0.10.1) as well as 0.7.3 both fail.
Steps to reproduce the bug
pip install camelot-py==0.10.1
camelot -p all -o "test-NEW.csv" -f csv -split -strip ".\n" lattice -scale 100 -copy v "369746.pdf"
pip install camelot-py==0.7.2
camelot -p all -o "test-OLD.csv" -f csv -split -strip ".\n" lattice -scale 100 -copy v "369746.pdf"
Expected behavior
Line 2 of test-OLD.csv is what we should have:
"SMM camera at Donetsk Filtration Station (15km N of Donetsk)","0.5-1.5km","S","Recorded","2","Projectile","From E to W","N/K","31-Jan","19:35"
Line 2 of test-NEW.csv is misread: "SMM camera at Donetsk Filtration Station (15km N of Donetsk)","05-15km","S","Recorded","1","Projectile","From E to W","N/K","31-Jan","19:34"
(Note that the same thing happens to the column name located in the first row -- "No." is converted into "No".)
PDF
PDF
Environment