USPTO / PatentPublicData

Utility tools to help download and parse patent data made available to the public
Other
182 stars 80 forks source link

Chemical Formulae and Mathematical Formulae in Abstract/ Descriptions #67

Closed aosingh closed 6 years ago

aosingh commented 6 years ago
{
   "abstract":{
        "raw":"<abstract id=\"abstract\">\n<p id=\"p-0001\" num=\"0000\">Disclosed is a method for the selective catalytic reduction of NO<sub>x </sub>in waste/exhaust gas by using ammonia provides by heating one or more salts of formula M<sub>a</sub>(NH<sub>3</sub>)<sub>n</sub>X<sub>z</sub>, wherein M represents one or more cations selected from alkaline earth metals and transition metals, X represents one or more anions, a represents the number of cations per salt molecule, z represents the number of anions per salt molecule, and n is a number of from 2 to 12, the one or more salts having been compressed to a bulk density above 70% of the skeleton density before use thereof.</p>\n</abstract>",

        "normalized":"\n<p id=\"p-0001\" num=\"0000\" level=\"\">Disclosed is a method for the selective catalytic reduction of NO? in waste/exhaust gas by using ammonia provides by heating one or more salts of formula M?(NH?)?X<sub>z</sub>, wherein M represents one or more cations selected from alkaline earth metals and transition metals, X represents one or more anions, a represents the number of cations per salt molecule, z represents the number of anions per salt molecule, and n is a number of from 2 to 12, the one or more salts having been compressed to a bulk density above 70% of the skeleton density before use thereof.</p>\n",

        "plain":"\nDisclosed is a method for the selective catalytic reduction of NO? in waste/exhaust gas by using ammonia provides by heating one or more salts of formula M?(NH?)?Xz, wherein M represents one or more cations selected from alkaline earth metals and transition metals, X represents one or more anions, a represents the number of cations per salt molecule, z represents the number of anions per salt molecule, and n is a number of from 2 to 12, the one or more salts having been compressed to a bulk density above 70% of the skeleton density before use thereof.\n"
    }
}

The raw text has a formula with the markup information as shown below, M<sub>a</sub>(NH<sub>3</sub>)<sub>n</sub>X<sub>z</sub>

However, some of the information is lost in normalized version of the text. M?(NH?)?X<sub>z</sub>

bgfeldm commented 6 years ago

At a quick glance looks like it may be a Unicode issue, either displaying or printing. If it's displaying it may be the editor your viewing the file with. But it could also be the code printing the file in ASCII, which would print Unicode characters, with no ascii conversion, as question marks.