Closed cragwolfe closed 11 months ago
There are plenty of occurrences of this in the outputs from this PDF.
jq '.[] | select(.type == "Title") | .text' PLAW-107publ56.pdf.json | grep -P '^"[a-z]+'
"of Investigation."
"gencies."
"to terrorism."
"to computer fraud and abuse offenses."
"agents of a foreign power."
"limb."
"lance Act."
"trace devices."
"transactions of primary money laundering concern."
"counts."
"banks."
"crimes, and the finances of terrorist groups."
"ment references."
"vestment company study."
"business."
"ports of entry and overseas consular posts."
"view."
"safety officers."
"systems."
"rorism."
"sponse to Government requests."
"ligence under National Security Act of 1947."
"ligence and intelligence"
"related matters."
"eign intelligence."
"for bioterrorism preparedness and response."
...
Describe the bug From community slack:
One major issue I am facing is in many cases last sentence of a paragraph is getting classified as Title instead of NarrativeText or Text Examples Paragraph : So I think your understanding is fine, it is not going to have an impact either the price rise or price drop in any of the intermediate chemicals.
Maybe a check like text should not start with lowercase letter for it to be classified as a Title would be helpful
Additional context Reported in slack