NYPL-Simplified / circulation

Circulation manager for Library Simplified
Other
19 stars 19 forks source link

Too many cover images are broken #339

Open leonardr opened 8 years ago

leonardr commented 8 years ago

Browsing SimplyE will turn up many examples of works that should have cover images, and do have links to cover images, but those links don't work. Here's an example:

https://circulation.librarysimplified.org/works/Overdrive/Overdrive ID/8b3ff2ec-c524-4a61-a43c-583ac505f7fd

This is mainly a NYPL operations issue but the cover images got screwed up due to bad code or misconfiguration, which may still be in effect.

leonardr commented 8 years ago

Broken cover images tend to cause client-side problems such as https://github.com/NYPL-Simplified/android/issues/303

aslagle commented 8 years ago

Also https://circulation.librarysimplified.org/works/Overdrive/Overdrive%20ID/379c1aa5-3c1b-4c41-bfc3-000921c40feb, which is one of the most searched-for books in the app.

leonardr commented 8 years ago

I made a big dent in this by redoing the OPDS entries of works affected by a bug that has since been fixed. On Monday I'll check back and see how bad the problem is.

leonardr commented 8 years ago

I haven't used the client to check but looking at the feeds I would expect the thumbnail problems to be fixed.

However I'm seeing a number of books that correctly use the CDN for their thumbnail image but use it when they shouldn't for the full-size image. Examples:

 <entry schema:additionalType="http://schema.org/Book">
    <id>urn:librarysimplified.org/terms/id/Overdrive%20ID/2a879db5-6a52-4962-ba9a-b64eada667bf</id>
    <title>The Hungry Tide</title>
    <alternativeHeadline>A Novel</alternativeHeadline>
    <bibframe:distribution bibframe:ProviderName="Overdrive"/>
    <author>
      <name>Amitav Ghosh</name>
    </author>
    <summary type="html">&lt;p&gt;From the author of the international bestseller The Glass Palace, The Hungry Tide is a novel of adventure and romance set in the exotic Sundarbans &amp;#8212; treacherous islands in the Bay of Bengal where isolated inhabitants live in fear of drowning tides and man-eating tigers. A headstrong young American arrives in this lush landscape to study a rare species of river dolphin. She enlists the aid of a local fisherman and a translator, and soon their fates on the waterways will be determined by the forces of nature and human folly.</summary>
    <updated>2016-02-29T10:35:52Z</updated>
    <simplified:pwid>2dbb1d63-0744-70ad-a380-6a816812e4d5</simplified:pwid>
    <link href="https://d3pqhns20516vc.cloudfront.net/ImageType-100/0874-1/%7B2A879DB5-6A52-4962-BA9A-B64EADA667BF%7DImg100.jpg" type="image/jpeg" rel="http://opds-spec.org/image"/>
    <link href="https://d3pqhns20516vc.cloudfront.net/scaled/300/Overdrive/2a879db5-6a52-4962-ba9a-b64eada667bf/cover.jpg" type="image/jpeg" rel="http://opds-spec.org/image/thumbnail"/>
  <entry schema:additionalType="http://schema.org/Book">
    <id>urn:librarysimplified.org/terms/id/Overdrive%20ID/cf03312f-f704-429f-9081-743cdaa756ef</id>
    <title>Lies We Tell Ourselves</title>
    <bibframe:distribution bibframe:ProviderName="Overdrive"/>
    <author>
      <name>Robin Talley</name>
    </author>
    <summary type="html">&lt;p&gt;
&lt;p&gt;In 1959 Virginia, the lives of two girls on opposite sides of the battle for civil rights will be changed forever. &lt;/p&gt;&lt;p&gt;Sarah Dunbar is one of the first black students to attend the previously all-white Jefferson High School. An honors student at her old school, she is put into remedial classes, spit on and tormented daily. &lt;/p&gt;&lt;p&gt;Linda Hairston is the daughter of one of the town's most vocal opponents of school integration. She has been taught all her life that the races should be kept &amp;#34;separate but equal.&amp;#34; &lt;/p&gt;&lt;p&gt;Forced to work together on a school project, Sarah and Linda must confront harsh truths about race, power and how they really feel about one another. &lt;/p&gt;&lt;p&gt;Boldly realistic and emotionally compelling, Lies We Tell Ourselves is a brave and stunning novel about finding truth amid the lies, and finding your voice even when others are determined to silence it.&lt;/p&gt;</summary>
    <updated>2016-02-21T16:04:50Z</updated>
    <simplified:pwid>d6626f11-4b26-e4e8-ac0f-0e8fcb70e564</simplified:pwid>
    <link href="https://d3pqhns20516vc.cloudfront.net/ImageType-100/1071-1/%7BCF03312F-F704-429F-9081-743CDAA756EF%7DImg100.jpg" type="image/jpeg" rel="http://opds-spec.org/image"/>
    <link href="https://d3pqhns20516vc.cloudfront.net/scaled/300/Overdrive/cf03312f-f704-429f-9081-743cdaa756ef/cover.jpg" type="image/jpeg" rel="http://opds-spec.org/image/thumbnail"/>

It looks like the mismatched data is a relic of the era before presentation edition. The work's presentation edition is set to the Overdrive edition (rather than a synthetic edition), and the Overdrive edition has a mix of data from the metadata wrangler and Overdrive.

I don't think this will cause problems for clients but there are about 65000 books with these bad links in the production database:

select count(w.id) from works w join editions e on w.presentation_edition_id=e.id where cover_full_url like '%contentreserve.com%' and cover_thumbnail_url not like '%contentreserve%' and w.simple_opds_entry like '%https://d3pqhns20516vc.cloudfront.net/ImageType-100/%';
leonardr commented 8 years ago

Another 3000 Overdrive books had a full image but no thumbnail. I fixed them. This was a SimplyE specific problem.

There are also about 1000 Overdrive books that have neither full image nor thumbnail. That's a different problem which I'm investigating.

leonardr commented 8 years ago

https://github.com/NYPL-Simplified/server_core/issues/374 tracks the books with neither full image nor thumbnail.