NYPL-Simplified / circulation

Circulation manager for Library Simplified
Other
19 stars 19 forks source link

Some descriptions are incorrectly converted from Windows-1252 to UTF-8 #217

Open leonardr opened 8 years ago

leonardr commented 8 years ago

Example:

  <entry schema:additionalType="http://schema.org/Book">
    <id>urn:librarysimplified.org/terms/id/3M%20ID/hgpn7z9</id>
    <title>Wild Cat</title>
    <bibframe:distribution bibframe:ProviderName="3M"/>
    <author>
      <name>Christine Feehan</name>
    </author>
    <summary type="html">&lt;b&gt;In the new Leopard novel by the #1 &lt;i&gt;New York Times&lt;/i&gt; bestselling author of &lt;i&gt;Cat&#226;&#8364;&#8482;s Lair &lt;/i&gt;and &lt;i&gt;Leopard&#226;&#8364;&#8482;s Prey,&lt;/i&gt; passions explode like wildfire when a young woman&#226;&#8364;&#8482;s feral instincts are ignited by a man who&#226;&#8364;&#8482;s too dangerous not to desire&#226;&#8364;&#166;&lt;/b&gt;A simple request for Siena Arnotto: deliver a gift to her grandfather&#226;&#8364;&#8482;s friend. One look at Elijah Lospostos, hard-bodied and stripped to the waist, and Siena succumbs to a feline stirring she never felt before, and to Elijah&#226;&#8364;&#8482;s reckless and pleasurable demands. But when that pulse-throbbing moment ends in the murder of an unexpected intruder, Elijah accuses the shaken and confused Siena of setting him up.Then Siena discovers the truth of her Leopard heritage, of the secrets in her grandfather&#226;&#8364;&#8482;s inner circle, and the sinister plot of revenge that has put her in jeopardy. When Siena&#226;&#8364;&#8482;s grandfather is assassinated, she realizes the only man she can trust is Elijah. Now as her Leopard rises from within, Siena and Elijah share not only an animal instinct for survival&#226;&#8364;&#8221;but a desire so raw and wild it may be the only thing that can save them.</summary>
    <updated>2016-05-12T15:13:50Z</updated>
    <simplified:pwid>d4c5ea55-a6b3-2eb6-592d-a53793a652d4</simplified:pwid>
    <link href="https://d3pqhns20516vc.cloudfront.net/3M/3M%20ID/hgpn7z9/cover.jpg" type="image/jpeg" rel="http://opds-spec.org/image"/>
    <link href="https://d3pqhns20516vc.cloudfront.net/scaled/300/3M/3M%20ID/hgpn7z9/cover.jpg" type="image/jpeg" rel="http://opds-spec.org/image/thumbnail"/>
    <category term="Adult" scheme="http://schema.org/audience" label="Adult"/>
    <category term="18" scheme="http://schema.org/typicalAgeRange" label="18"/>
    <category term="http://librarysimplified.org/terms/fiction/Fiction" scheme="http://librarysimplified.org/terms/fiction/" label="Fiction"/>
    <category term="http://librarysimplified.org/terms/genres/Simplified/Paranormal%20Romance" scheme="http://librarysimplified.org/terms/genres/Simplified/" label="Paranormal Romance"/>
    <dcterms:language>en</dcterms:language>
    <dcterms:publisher>Penguin Publishing Group</dcterms:publisher>
    <published>2015-12-03T16:32:45Z</published>
    <dcterms:created>2015-11-23</dcterms:created>
    <link href="https://circulation.librarysimplified.org/works/3M/3M%20ID/hgpn7z9" type="application/atom+xml;type=entry;profile=opds-catalog" rel="alternate"/>
    <link href="https://circulation.librarysimplified.org/works/3M/3M%20ID/hgpn7z9/report" rel="issues"/>
    <link href="https://circulation.librarysimplified.org/works/3M/3M%20ID/hgpn7z9/borrow" rel="http://opds-spec.org/acquisition/borrow" type="application/atom+xml;type=entry;profile=opds-catalog">
      <opds:indirectAcquisition type="vnd.adobe/adept+xml">
        <opds:indirectAcquisition type="application/epub+zip"/>
      </opds:indirectAcquisition>
      <opds:availability status="available"/>
      <opds:holds total="0"/>
      <opds:copies available="1" total="5"/>
    </link>
    <link href="https://circulation.librarysimplified.org/feed/eng/English%20-%20Best%20Sellers" rel="collection" title="Best Sellers"/>
  </entry>

The description renders as:

In the new Leopard novel by the #1 New York Times bestselling author of Cat’s Lair and Leopard’s Prey, passions explode like wildfire when a young woman’s feral instincts are ignited by a man who’s too dangerous not to desire…A simple request for Siena Arnotto: deliver a gift to her grandfather’s friend. One look at Elijah Lospostos, hard-bodied and stripped to the waist, and Siena succumbs to a feline stirring she never felt before, and to Elijah’s reckless and pleasurable demands. But when that pulse-throbbing moment ends in the murder of an unexpected intruder, Elijah accuses the shaken and confused Siena of setting him up.Then Siena discovers the truth of her Leopard heritage, of the secrets in her grandfather’s inner circle, and the sinister plot of revenge that has put her in jeopardy. When Siena’s grandfather is assassinated, she realizes the only man she can trust is Elijah. Now as her Leopard rises from within, Siena and Elijah share not only an animal instinct for survival—but a desire so raw and wild it may be the only thing that can save them.

This could be a problem on our side (we incorrectly parsed a Windows-1252 XML document as UTF-8) or it could be on the 3M side (3M served Windows-1252 in a document labeled as UTF-8).

leonardr commented 8 years ago

I saw this on open ebooks in an Axis 360 book as well:

 <entry schema:additionalType="http://schema.org/Book">
    <id>urn:librarysimplified.org/terms/id/Axis%20360%20ID/0011495759</id>
    <title>Blaze of Silver</title>
    <bibframe:distribution bibframe:ProviderName="Axis 360"/>
    <author>
      <name>K. M Grant</name>
    </author>
    <summary type="html">&lt;div&gt;&lt;div&gt;&lt;div&gt;&lt;div&gt;&lt;i&gt;Can a knight sacrifice everything he holds sacred in service to his king?&lt;/i&gt;&lt;br /&gt;&lt;i&gt;&#195;&#8218; &lt;/i&gt;&lt;br /&gt;&#195;&#8218; &#195;&#8218; &#195;&#8218; &#195;&#8218; &#195;&#8218; &#195;&#8218; &#195;&#8218; &#195;&#8218; King Richard is being held for ransom in Germany, and England is in chaos without a sovereign. The de Granvilles&lt;i&gt;must&lt;/i&gt; deliver the treasure that will win their King's freedom.&lt;div&gt;&lt;br /&gt;&#195;&#8218; &#195;&#8218; &#195;&#8218; &#195;&#8218; &#195;&#8218; &#195;&#8218; &#195;&#8218; &#195;&#8218; &#195;&#8218; But the Old Man of the Mountain, leader of the Assassins, is ready to take his revenge on Kamil. To do so he will orchestrate a great betrayal, and his tentacles of treachery reach far and deep. Kamil is determined to prevail-the Old Man must not win. But when all trust has been shattered, who can he turn to for help?&lt;div&gt;&lt;br /&gt;&#195;&#8218; &#195;&#8218; &#195;&#8218; &#195;&#8218; &#195;&#8218; &#195;&#8218; &#195;&#8218; &#195;&#8218; &#195;&#8218; &#195;&#8218; &#195;&#8218; The fate of a king, the bonds of a friendship, and the depth of an enduring love will all be in jeopardy as Will, Ellie, Kamil, and Hosanna face the ultimate loyalty test, and their final battle.&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;</summary>
    <updated>2016-04-14T14:05:12Z</updated>
    <simplified:pwid>8c3d8a5a-e14e-c265-5a69-dce116061d7e</simplified:pwid>
    <link href="https://d3pqhns20516vc.cloudfront.net/Content%20Cafe/ISBN/9780802728067/cover.jpg" type="image/jpeg" rel="http://opds-spec.org/image"/>
    <link href="https://d3pqhns20516vc.cloudfront.net/scaled/300/Content%20Cafe/ISBN/9780802728067/cover.jpg" type="image/jpeg" rel="http://opds-spec.org/image/thumbnail"/>
    <category term="Young Adult" scheme="http://schema.org/audience" label="Young Adult"/>
    <category term="12-14" scheme="http://schema.org/typicalAgeRange" label="12-14"/>
    <category term="http://librarysimplified.org/terms/genres/Simplified/Adventure" scheme="http://librarysimplified.org/terms/genres/Simplified/" label="Adventure"/>
    <category term="http://librarysimplified.org/terms/genres/Simplified/Romance" scheme="http://librarysimplified.org/terms/genres/Simplified/" label="Romance"/>
    <dcterms:language>en</dcterms:language>
    <dcterms:publisher>Bloomsbury USA</dcterms:publisher>
    <published>2016-01-15T00:00:48Z</published>
    <dcterms:created>2011-08-01</dcterms:created>
    <link href="http://qa.circulation.openebooks.us/works/Axis%20360/Axis%20360%20ID/0011495759" type="application/atom+xml;type=entry;profile=opds-catalog" rel="alternate"/>
    <link href="http://qa.circulation.openebooks.us/works/Axis%20360/Axis%20360%20ID/0011495759/report" rel="issues"/>
    <link href="http://qa.circulation.openebooks.us/works/Axis%20360/Axis%20360%20ID/0011495759/borrow/2" rel="http://opds-spec.org/acquisition/borrow" type="application/atom+xml;type=entry;profile=opds-catalog">
      <opds:indirectAcquisition type="vnd.adobe/adept+xml">
        <opds:indirectAcquisition type="application/epub+zip"/>
      </opds:indirectAcquisition>
      <opds:availability status="available"/>
      <opds:holds total="0"/>
      <opds:copies available="9995" total="9999"/>
    </link>
    <link href="https://d1eqhiepgvnnew.cloudfront.net/groups/" rel="collection" title="High School"/>
  </entry>

Can a knight sacrifice everything he holds sacred in service to his king? Â Â Â Â Â Â Â Â Â King Richard is being held for ransom in Germany, and England is in chaos without a sovereign. The de Granvillesmust deliver the treasure that will win their King's freedom.

         But the Old Man of the Mountain, leader of the Assassins, is ready to take his revenge on Kamil. To do so he will orchestrate a great betrayal, and his tentacles of treachery reach far and deep. Kamil is determined to prevail-the Old Man must not win. But when all trust has been shattered, who can he turn to for help?

           The fate of a king, the bonds of a friendship, and the depth of an enduring love will all be in jeopardy as Will, Ellie, Kamil, and Hosanna face the ultimate loyalty test, and their final battle.

leonardr commented 8 years ago

I also saw this in a Standard Ebooks book on the open-access content server:

  <entry schema:additionalType="http://schema.org/Book">
    <id>https://standardebooks.org/ebooks/anton-chekhov/the-duel/constance-garnett</id>
    <title>The Duel</title>
    <bibframe:distribution bibframe:ProviderName="Standard Ebooks"/>
    <author>
      <name>Anton Chekhov</name>
      <simplified:sort_name>Chekhov, Anton Pavlovich</simplified:sort_name>
    </author>
    <summary type="html">Von Koren, an educated zoologist, finds the slovenly lifestyle of Laevsky, a drinker, to be worthless.  Finally, Laevsky can&#226;&#8364;&#8482;t take it any more.</summary>
    <updated>2016-05-06T15:05:38Z</updated>
    <simplified:pwid>b83d76b1-29be-dc3b-7e73-ea679ed1158b</simplified:pwid>
    <link href="http://book-covers.nypl.org/Standard%20Ebooks/URI/https%3A//standardebooks.org/ebooks/anton-chekhov/the-duel/constance-garnett/cover.svg" type="image/png" rel="http://opds-spec.org/image"/>
    <link href="http://book-covers.nypl.org/scaled/300/Standard%20Ebooks/URI/https%3A//standardebooks.org/ebooks/anton-chekhov/the-duel/constance-garnett/cover.png" type="image/png" rel="http://opds-spec.org/image/thumbnail"/>
    <category term="Adult" scheme="http://schema.org/audience" label="Adult"/>
    <category term="18" scheme="http://schema.org/typicalAgeRange" label="18"/>
    <category schema:ratingValue="1" term="Short stories, Russian -- Translations into English" scheme="http://purl.org/dc/terms/LCSH"/>
    <category schema:ratingValue="1" term="Chekhov, Anton Pavlovich, 1860-1904 -- Translations into English" scheme="http://purl.org/dc/terms/LCSH"/>
    <category schema:ratingValue="1" term="Russia -- Social life and customs -- Fiction" scheme="http://purl.org/dc/terms/LCSH"/>
    <category term="http://librarysimplified.org/terms/fiction/Fiction" scheme="http://librarysimplified.org/terms/fiction/" label="Fiction"/>
    <dcterms:language>en</dcterms:language>
    <dcterms:publisher>Standard Ebooks</dcterms:publisher>
    <published>2016-01-12T21:53:44Z</published>
    <schema:Rating schema:ratingValue="0.8000" schema:additionalType="http://librarysimplified.org/terms/rel/quality"/>
    <link href="http://oabooks.nypl.org/Standard%20Ebooks/URI/https%3A//standardebooks.org/ebooks/anton-chekhov/the-duel/constance-garnett/The%20Duel.azw3" type="application/x-mobipocket-ebook" rel="http://opds-spec.org/acquisition/open-access"/>
    <link href="http://oabooks.nypl.org/Standard%20Ebooks/URI/https%3A//standardebooks.org/ebooks/anton-chekhov/the-duel/constance-garnett/The%20Duel.epub3" type="application/epub+zip" rel="http://opds-spec.org/acquisition/open-access"/>
    <link href="http://oabooks.nypl.org/Standard%20Ebooks/URI/https%3A//standardebooks.org/ebooks/anton-chekhov/the-duel/constance-garnett/The%20Duel.epub" type="application/epub+zip" rel="http://opds-spec.org/acquisition/open-access"/>
  </entry>
``
leonardr commented 8 years ago

This happens in so many different situations because the problem's not with the original data. It's a stupid error in Representation.unicode_content.

leonardr commented 6 years ago

I still see this problem occasionally, so there may also be problems with the original data.