Open tofuwabohu opened 3 years ago
This is an interesting one! I think there are two things that are valuable about this dataset: 1) It has the full text of everything in the index (I think? are there cases where TAL has an entry but not the entire piece?) 2) It has a bunch of material that isn't in other repositories (including a lot of short-form and essays, which is interesting from a metadata perspective)
For works that are available via other connectors, the metadata is comparably poor. Looking at Anarchy in Action, we have multiple editions with covers and identifiers via OL (https://bookwyrm.social/book/106708), whereas TAL provides one edition with much less comprehensive metadata (https://theanarchistlibrary.org/library/colin-ward-anarchy-in-action). In this example, there's an isbn in the "notes" section, but it isn't exposed in the search json and I can't find any way to access it programmatically (it looks like it's part of a free text field, so even if "notes" were available, it wouldn't be viable to find meaningful data in it). This will make it impossible to deduplicate an edition that came in from TAL from the same edition via a different connector.
However, this could related to #97 and #693 in productive ways.
It looks like the search endpoint is usable for both search and for loading metadata. I can search "anarchy in action colin ward" and get a list of works: https://theanarchistlibrary.org/search?query=anarchy+in+action+colin+ward And search by uri to get just the work that's of interest: https://theanarchistlibrary.org/search?query=uri%3Acolin-ward-anarchy-in-action&fmt=json.
One thing that is challenging is that the search is full text, so it's going to produce a lot of results that aren't what you'd expect based on the other connectors. For example, a free text search of "The Lord of the Rings" produces a wholly unrelated set of search results to JRR Tolkien. With the code as it is, this would break goodreads and librarything data imports, so while that's an eminently solvable problem, it it's important to remember to solve it.
Long story short, it's totally doable, and while the metadata will be lacking, it will be an asset to have a resource with full ebook versions. At a quick look, it should be possible to use the same code for any of the different language-specific version of TAL.
Is your feature request related to a problem? Please describe. It would be cool to get search results from the Anarchist library next to OL and inventaire.
API Here is the calibre plugin which also is in python: https://gitea.multiname.org/ibu/calibre-tal/src/branch/master/theanarchistlibrary_store/theanarchistlibrary_plugin.py An example searchstring would be
https://theanarchistlibrary.org/search?fmt=json&page=0&query=Bookchin
The results in json contain a fieldtext_qualification
, which isbook
for books (next toarticle
and others), so they can be distinguished from the many other texts over there. Example result (shortened to 1 entry):I couldn't find any doku on the API at first glance. There are up to 9 results per page, probably one can find how many pages there are in total somehow.
I think the AL doesn't store existing editions but build the ebooks by themselves from the their html, so maybe that interfers a bit with how editions are kept currently.
Additional notes There are some region-specific Anarchist libraries that are independent but look pretty similar so maybe they could be included, I'm not sure on how but it'd be pretty cool to have them too. https://www.anarchistlibraries.net/libraries