bookwyrm-social / bookwyrm

Social reading and reviewing, decentralized with ActivityPub
http://joinbookwyrm.com/
Other
2.22k stars 259 forks source link

data import from goodreads #48

Closed mouse-reeve closed 4 years ago

dana-ross commented 4 years ago

:+1: to this. A basic CVS import of books should be straightforward but I imagine mapping data like shelves and reviews to Fedireads' data might not be so easy.

mouse-reeve commented 4 years ago

Yeah, it should be straightforward to upload the goodreads csv and parse out the relevant fields.

Then it's a matter of matching goodread to-read/reading/read shelves with fedireads shelves (also straightforward), and creating any shelves that exist in goodreads but not fedireads. Maybe there should be a field on the shelf table to map a goodreads shelf identifier 🤔

For books, openlibrary can resolve identifiers with isbn or title/author search. There will inevitably be some books that it can't match in some uploads.

I think the best approach, at least as a first pass, is to show the user the proposed new shelves, books successfully matches, and books it can't match, and then have them approve the upload, and just discard the unmatched books.

A potential future task would be to allow users to create new books for the ones it can't match (#54 is relevant here)

seniorm0ment commented 3 years ago

unrelated, but wanted to avoid opening a new issue for such a minor question. Does Goodreads not allow scraping book data (similar to Google Maps not allowing scraping of data to OpenStreetMaps)? Or hasbook import functionality from GoodReads just not been added yet? Due to Goodreads being so big, it will of course have a massive amount of information already associated on it, it would be cool if we can scrape from it.

For clarity, I do not mean CVS imports, but instead book data pages (similar to how you import books from OpenLibrary and Inventinare(

mouse-reeve commented 3 years ago

It would be cool but yeah, their API and terms of service are not at all permissive, so they don't work as a datasource. Same for Librarything, unfortunately.