Open bandleader opened 3 years ago
As an example, the URL https://www.sefaria.org/api/texts/Ecclesiastes.5 (with no commentaries) currently returns 54 keys):
ref
heRef
isComplex
text
he
versions
textDepth
sectionNames
addressTypes
lengths
length
heTitle
titleVariants
heTitleVariants
type
primary_category
book
categories
order
sections
toSections
isDependant
indexTitle
heIndexTitle
sectionRef
firstAvailableSectionRef
heSectionRef
isSpanning
versionTitle
versionTitleInHebrew
versionSource
versionStatus
license
versionNotes
extendedNotes
extendedNotesHebrew
versionNotesInHebrew
digitizedBySefaria
heVersionTitle
heVersionTitleInHebrew
heVersionSource
heVersionStatus
heLicense
heVersionNotes
heExtendedNotes
heExtendedNotesHebrew
heVersionNotesInHebrew
heDigitizedBySefaria
alts
next
prev
commentary
sheets
layer
As a demo, I produced a nearly fully-functional GraphQL version of the Text API.
(Currently it is a caching proxy to the official REST API, but the goal would be for you to take the GraphQL schema I wrote and use it in your Python backend.)
You can try it out here: https://enez2.sse.codesandbox.io/
book
on blank line. Notice the completion and API docs context help.query {
text(ref: "Kohelet 5") {
# Start typing 'book' on the line below, and see what happens
title {
en
}
textLines {
he
}
}
}
@M-Zuber Use this to get Mikra, Rashi and Targum in Hebrew and nothing else (as requested in #601)
query {
text(ref: "Vayikra 1") {
title {
en
}
textLines {
he
}
commentaries(filterByTitles: ["Rashi", "אונקלוס"]) {
title {
en
}
textLines {
he
}
}
}
}
@bandleader thanks for the suggestion. I have to admit that I don't yet have any experience with GraphQL, outside of trivial integration with Facebook. I'll aim to study up. Who else in the community has experience here? Feel free to sound off on this thread.
@EliezerIsrael GraphQL has great docs, and I'm happy to help if I can. You're also welcome to use the GraphQL schema I designed (click the 'Schema' tab here), so some of the work is already done.
Also, I added the Sefaria Calendar API to my demo Sefaria GraphQL API. I mention this because it's another example of where having a graph API shines, because you can not only get the level of detail you want for each calendar item, you can even ask for the actual text in the same request. (And optionally other fields, versions, commentaries, translations, filtering, etc... all the regular options I already provide for texts)
# Try this at https://enez2.sse.codesandbox.io
query {
calendarSections {
items {
type { en }
value { en }
text {
textLinesJoined { he }
}
}
}
}
For Shnayim Mikra, just apply filtering -- both on the calendar sections and on the commentaries:
# Try this at https://enez2.sse.codesandbox.io
query {
calendarSections(filterByTypes: ["Parashat Hashavua"]) {
items {
text {
textLinesJoined { he }
commentaries(
filterByTitles: ["Rashi", "אונקלוס"],
# We don't need Rashi on Judges, for instance,
# which has the category "Quoting Commentary"
filterByCategories: ["Commentary", "Targum"]
) {
title { en }
textLinesJoined { he }
}
}
}
}
}
Bonus: since GraphQL is composable, it makes it easy to add parameters like stripTrop
, stripNikud
, and stripHtmlTags
, and they'll work wherever { en }
and { he }
do. I've already implemented them in my demo; try them out:
# Try this at https://enez2.sse.codesandbox.io
query {
text(ref: "Kohelet 5") {
textLinesJoined {
en(stripHtmlTags: true)
he(stripTrop: true, stripNikud: true)
}
}
}
@monove Works for me, try again?
Working now. I was getting a 503 before. This is so fast and amazing! @EliezerIsrael: this would seem like a win-win for both those using the API and Sefaria as this will lower everyones bandwidth costs significantly and reduce response times and load by an incredible amount, no?
@EliezerIsrael I have experience with GraphQL and I was actually going to suggest it to Sefaria, but didn't know whether my suggestion would be appreciated. I think this is a great idea and I second everything @bandleader said. I would also add that it can especially help for mobile since this was part of the reason why Facebook created GraphQL. I highly recommend this talk to understand the costs and benefits of GraphQL: https://youtu.be/djKPtyXhaNE (Also shameless plug for a blog post I wrote on the topic: https://medium.com/geekculture/graphql-the-good-the-bad-and-the-bottomline-623de7dbcffb )
@JonMosenkis has prepped a proof of concept on PR #741
One of the concerns that I have is how to avoid pathological queries. For example - if once can query sources linked to a source, what's to prevent a user from querying a search space that would cause a killing load on the webserver?
My inclination is to keep this as a separate branch and deploy it against its own DB instance, read only, until we can get a good picture of the load it causes, and how to put guardrails on it.
@EliezerIsrael That is actually a concern with GraphQL I think. In one of the videos I linked to in my blog post, the person mentions it. There are ways of configuring the GraphQL server to only accept certain queries, but it is not easy and can become complicated quickly. (I have not personally done this, but this is what I found out in my research.)
This is actually mentioned in the GraphQL docs. In a nutshell: there isn't anything a GQL request can make your server do that it can't already do through REST. It's just that it if you were previously throttling based on raw number of HTTP requests, with GQL you have to take into account that you can have multiple queries in a single request, and also you can have queries nested within other queries (like my query above that gets today's parsha from the Calendar API and then gets Chumash and Targum for it), so you have to take that into account.
n
seconds, or even throttle access based on the combined number of seconds the user's requests have run in a given time span. Your Python GQL lib of choice may have built-in support for at least timeouts and query depth.
Hello, I'm new as a Sefaria API dev and hope I manage to help out the project a bit as I benefit from it! Thank you to all the devs, contributors, management and backing for an amazing project, literally one of the most important projects in the world.
IMHO, Sefaria would benefit greatly from having a GraphQL API. Of all my suggestions, I think this would be the one with the most wide-ranging benefits for the least cost, as well arguably the most obvious.
Example
Just as an example, asking for commentaries on Genesis 1 returns a 46.7 MB JSON document and takes (for me) 20.5 seconds. This is (a) not really an acceptable UX for most applications, (b) actually hard even DX-wise, as Chrome's devtools in fact chokes on the JSON and I can't inspect it properly.
If all an app wanted was the Hebrew text of the Gemara, Rashi and Onkelos (as in #601), that would likely be under 20kb (1/384th of 7.5MB). The app would have no need for titles, section names, English translation, Ramban, and the other 387/388 of the JSON returned (as well as the server utilization cost).
Way Forward
Although I'm not good with Python, I understand that Python works very well with GraphQL and it's quite easy to write servers/resolvers. I won't really be useful writing code, but if necessary I can בל"נ write GraphQL schemas as well as do testing and perhaps even documentation (which can be mostly generated automatically, and even experimented with live using GraphiQL).
If people are not familiar with GraphQL and the DX advantages, I could perhaps create a model schema and deploy a simple API as a sort of live mockup, so you can see what a difference it would make.
I'd also like to say that GraphQL resolvers generally resemble objects/classes with properties/function calls, so it might be very feasible to implement this on top of the existing Python classes, or as a thin layer on top of them. As well, it is entirely feasible for REST APIs to use GQL resolvers to generate their data, reducing code duplication.
EDIT: demo available, see below