This happens because the text is scraped by columns and when converted to a single column is sorted from left to right. This means that if the abstract section is read as having two columns, keywords (or whatever) on the left, the abstract is to be read in as being in column two, and will be sorted into the text below the introduction text. It is unclear how often this will occour - but does seem to in the science direct formatted PDFs. However, they have an XML text mining solution which might make things a bit easier to deal with.
This happens because the text is scraped by columns and when converted to a single column is sorted from left to right. This means that if the abstract section is read as having two columns, keywords (or whatever) on the left, the abstract is to be read in as being in column two, and will be sorted into the text below the introduction text. It is unclear how often this will occour - but does seem to in the science direct formatted PDFs. However, they have an XML text mining solution which might make things a bit easier to deal with.