dwyl / reading-tracker

A metronome for speed reading that emits a discrete reminder to turn the page.
GNU General Public License v2.0
3 stars 1 forks source link

Open Library (Internet Archive) Books API #9

Open nelsonic opened 3 years ago

nelsonic commented 3 years ago

When researching which Book database to use for adding books to our app, we were reminded of https://openlibrary.org via: https://news.ycombinator.com/item?id=25405737

hadn't used Open Library in a while so revived my account and and attempted to go through the UX. IMO, the search functionality is not very good ... or at least the sorting is not intuitive. šŸ”

For example, if I visit amazon.com and start typing the name of Seth Godin's latest book "The Practice" I get an autosuggestion/autocompletion: image

This is the benchmark for Product search. Intuitive and returns the most relevant result immediately.

The book was released on Nov 3rd 2020 (2 months ago today). https://www.amazon.com/Practice-Shipping-Creative-Work/dp/0593328973 image

It was a best seller and has several hundred reviews on Amazon: image

If I attempt to search for "The Practice" on OpenLibrary and sort by newest, the results are useless: https://openlibrary.org/search?q=the+practice&mode=everything&sort=new open-library-search-for-the-practice

Note: without the sorting, the results are just as useless; I just thought that sorting would help, it didn't. Instead we get an irrelevant article published in 2104 (83 years in the future!)

If we type the name of the book and the name of the author a result is found: https://openlibrary.org/search?q=the+practice+seth+godin&mode=everything

The page for the book itself is underwhelming: https://openlibrary.org/works/OL21094745W/The_Practice image

1 five star rating but we cannot view it. šŸ¤·

Code

The code for the site is Open Source (obviously): https://github.com/internetarchive/openlibrary image

It's Python but doesn't appear to use Django or other recognisable framework according to setup.py or requirements.txt ... but it appears to use Vue.js, Core.js, JQuery and Chart.js according to the package.json Not much info on BuiltWith, sadly ... https://builtwith.com/detailed/openlibrary.org NGinx and Google Analytics. šŸ¤·

The search appears to be using Lucene from this dependency: https://github.com/internetarchive/openlibrary/blob/2ba6765c99c5f7d54b40821ee73feadffa2f5b0c/package.json#L51

API

The main API we're interested in is the Books one: https://openlibrary.org/dev/docs/api/books

If we attempt to view the aforementioned book via the API: https://openlibrary.org/books/OL29509316M.json

image

Note: Firefox automatically displays JSON content formatted. Use Firefox, it's better! šŸ˜‰ > Get it: mozilla.org/firefox

curl 'https://openlibrary.org/books/OL29509316M.json'

We get the following result:

{
   "publishers":[
      "Penguin Books, Limited"
   ],
   "languages":[
      {
         "key":"/languages/eng"
      }
   ],
   "source_records":[
      "bwb:9780241470046"
   ],
   "title":"Practice",
   "number_of_pages":208,
   "last_modified":{
      "type":"/type/datetime",
      "value":"2020-09-16T05:01:18.543431"
   },
   "created":{
      "type":"/type/datetime",
      "value":"2020-08-26T13:56:48.732622"
   },
   "isbn_13":[
      "9780241470046"
   ],
   "full_title":"Practice",
   "lc_classifications":[
      "",
      "HF5386"
   ],
   "publish_date":"2020",
   "key":"/books/OL29509316M",
   "authors":[
      {
         "key":"/authors/OL6847264A"
      }
   ],
   "latest_revision":2,
   "works":[
      {
         "key":"/works/OL21094745W"
      }
   ],
   "type":{
      "key":"/type/edition"
   },
   "revision":2
}

This is kinda useless as it lists the full_title as "Practice" which it is not. Rather the full title is "The Practice: Shipping Creative Work"

Given that Open Library can be edited by anyone, I updated the record: image

But I don't see other people doing this ... most people just consume Wiki content, they don't create/improve it. So I think our primary search still needs to be the Google Books API https://github.com/dwyl/library/issues/1