DARPA-ASKEM / knowledge-middleware

TA1 extraction pipelines
Apache License 2.0
3 stars 0 forks source link

cosmos pdf extraction content for left-columns on sidarthe interleaves text of left and right columns #84

Open orm011 opened 1 year ago

orm011 commented 1 year ago

Example shown, notice the first line of text continues into the corresponding line of the right-hand column, whereas the box only includes the left column. Ive noticed this elsewhere in the extractions for the left column, but the right column extractions did not seem to have this problem. Screenshot 2023-09-22 at 12 11 29 PM

Screenshot 2023-09-22 at 12 10 18 PM

brandomr commented 1 year ago

@iross @mwestphall maybe this should be an issue on the Cosmos repo? I think this is just purely a Cosmos quality issue if I'm not mistaken?

iross commented 1 year ago

Created https://github.com/UW-COSMOS/Cosmos/issues/200 to track it in the COSMOS repo. We'll take a look at it next week.