HHS / pillbox_docs

Pillbox at the National Library of Medicine
pillbox.nlm.nih.gov
45 stars 15 forks source link

Invalid XML - JSON support? #5

Open KenBerg75 opened 9 years ago

KenBerg75 commented 9 years ago

In my XML queries, I have noticed invalid/special characters in some of the text. Perhaps filtering these special characters or supporting JSON would remove the issue? One such example (Note the '&' in the RXSTRING):

<pill>
    <SPL_ID>0</SPL_ID>
    <PRODUCT_CODE>0573-0196</PRODUCT_CODE>
    <NDC9>005730196</NDC9>
    <SPLCOLOR>C48324</SPLCOLOR>
    <SPLIMPRINT>Advil;A;CR</SPLIMPRINT>
    <SPLSHAPE>C48345</SPLSHAPE>
    <SPLSIZE>17.00</SPLSIZE>
    <SPLSCORE>1</SPLSCORE>
    <RXCUI>1310509</RXCUI>
    <RXTTY>SY</RXTTY>
    <RXSTRING>Advil Allergy & Congestion Relief (chlorpheniramine maleate 4 MG / ibuprofen 200 MG / phenylephrine hydrochloride 10 MG) Oral Tablet</RXSTRING>
    <INGREDIENTS>Chlorpheniramine; Ibuprofen; Phenylephrine</INGREDIENTS>
    <HAS_IMAGE>0</HAS_IMAGE>
    <image_id></image_id>
    <SETID>a6cc97d8-252a-4527-a470-6d9e356342fd</SETID>
    <DEA_SCHEDULE_CODE></DEA_SCHEDULE_CODE>
    <AUTHOR>Pfizer Consumer Healthcare</AUTHOR>
    <SPL_INACTIVE_ING>ACESULFAME POTASSIUM / CARNAUBA WAX / CROSCARMELLOSE SODIUM / GLYCERIN / LACTIC ACID / MALTODEXTRIN / MEDIUM-CHAIN TRIGLYCERIDES / POLYDEXTROSE / POLYVINYL ALCOHOL / PROPYL GALLATE / SUCRALOSE / TALC / TITANIUM DIOXIDE / TRIACETIN / XANTHAN GUM / SILICON DIOXIDE / STARCH, CORN / GLYCERYL DIBEHENATE / HYPROMELLOSES / EGG PHOSPHOLIPIDS / CELLULOSE, MICROCRYSTALLINE</SPL_INACTIVE_ING>
</pill>

The '&' within the field makes this XML invalid.

marks commented 9 years ago

@KenBerg75 - while you're right, as one of the early API consumers of Pillbox, I know this has been the situation from the beginning and people have been working around it until v2.

As you can see in https://github.com/kgautreaux/pillboxr which is linked to from https://github.com/HHS/pillbox_docs 's README, the code literally converts ampersands: https://github.com/kgautreaux/pillboxr/blob/master/lib/pillboxr/request.rb

You might want to take this approach for now. Hope this helps!

ghost commented 9 years ago

@KenBerg75 I'm the PM. I'll look into this as we should be escaping special characters. Also, JSON is coming soon. Thanks.

Mauvila commented 8 years ago

I'll second this as an issue. I use JAXB for XML parsing, and the unescaped ampersands in the RXSTRING values are crashing the XML parser. Interestingly enough, the SPL_INACTIVE_ING values have escaped ampersands, but the RXSTRING ones do not.

ghost commented 8 years ago

@Mauvila we're moving the Pillbox API to Socrata's open data API (http://www.socrata.com/products/open-data-api/) in the next few weeks. Among a long list of benefits I'm excited about, it should also resolve this issue. I'll be notifying all the Pillbox API devs once we have the migration plan in place.