Closed QwertyCoolMT closed 5 years ago
text_strip='\n' ?
text_strip='\n' ?
code used to create this output: camelot.read_pdf('ITEMS.pdf',pages='1',text_strip='\n', flavor='stream', table_areas=['20,530,600,150'],columns=['30,330,380,410,470,530'])
Yes, so it's normal your datas are splitted with '\n'. If the question is why every letters are splitted, maybe you should try to play with col_tol parameter ?
Cols are decided by coloumns list in this one.. I will try to play with it anyway.
Also wondering whether it could have to do with pdfminer’s sentence/word detection
ended up re-opening file and stripping out the \n's myself as i was unable to find a solution within the library that worked.
Also wondering whether it could have to do with pdfminer’s sentence/word detection
Yes.
The strip_text
kwarg will only strip characters from the start and end of a string.
hey there, One of the PDF's I'm trying to read is getting a newline between every letter within a given cell: code used to create this output:
camelot.read_pdf('ITEMS.pdf',pages='1',text_strip='\n', flavor='stream', table_areas=['20,530,600,150'],columns=['30,330,380,410,470,530'])