Lyrics / lyrics-api

Dynamic website for the lyrics database
GNU Affero General Public License v3.0
2 stars 0 forks source link

Design APIs #4

Open defanor opened 5 years ago

defanor commented 5 years ago

Basic search is implemented now, but we need stable and specified APIs (DB schema, HTTP requests + {XML/XHTML, RDF, text} ones).

This issue continues https://github.com/Lyrics/lyrics.github.io/issues/15.

defanor commented 5 years ago

Currently search.pl handles 3 parameters: artist, album, and title. Additional parameters could be used to affect the following:

Preferably they should be orthogonal.

C0rn3j commented 5 years ago

Just to clarify, current output is XHTML and lyrics.wikia.com uses XML?

Would explain why am getting the middle finger from a plugin that used lyrics.wikia.com when trying to make it work with lyrics-api

xmlpp::TextReader reader{api_url} >> Cannot instantiate underlying libxml2 structure

I guess if I wanted to use it in the current state with XHTML only I'd have to somehow clean up the HTML crust to just have the XML file?

Looks like I should be using something like Tidy before processing it on the HTML.

So yeah, I practically answered my own questions as I wrote this comment, but if I got something wrong, please do correct me.

EDIT: Looks like Tidy is used to clean up non-compliant XHTML, which this project hopefully isn't, but I still can't figure out how to feed XHTML to the parser >.>

defanor commented 5 years ago

To be precise, search.pl currently serves HTML 5 with XHTML concrete syntax. XHTML is basically HTML serialized as XML, which should be usable by both regular web browsers and XML tools (and once we'll add RDFa -- also by RDF tools, covering 3 ways to access it with a single format). HTML 5 also requires <!DOCTYPE html>, not sure whether it may be an issue for some XML readers. And maybe we should set encoding in the XML processing instruction.

Can you please experiment with the used reader (by editing the document and trying to parse it), maybe get more error information out of it?

C0rn3j commented 5 years ago

It seems to work if I plop it into a file.xml and load the file instead of the URL with an identical file.

Think I should be using DomParser instead of TextReader somehow but I either suck at googling this or the info is sparse.

defanor commented 5 years ago

If an XML reader retrieves the file on its own, maybe it expects the application/xml content type HTTP header, rather than application/xhtml+xml. But it also should be trivial to serve simplified XML once we'll add format selection: documents get composed as plain XML now, and then the XSLT stylesheet gets applied to turn it into (X)HTML. So the difference between the two will be just in a header, and whether to apply that stylesheet/transform/template or not.

C0rn3j commented 5 years ago

I tried the headers hack but that wasn't it. I'll keep trying for a bit but I guess in the end I'll just wait till there's XML support in lyrics-api

C0rn3j commented 5 years ago

I nailed it down to HTTP vs HTTPS.

Works:

xmlpp::TextReader reader{"http://lyrics.wikia.com/api.php?action=lyrics&fmt=xml&artist=boa&song=duvet"};

Doesn't:

xmlpp::TextReader reader{"https://lyrics.wikia.com/api.php?action=lyrics&fmt=xml&artist=boa&song=duvet"};

I'll discuss this on xmlpp mailing list I guess.

defanor commented 5 years ago

I just pushed a commit adding format parameter handling: now plain xml can be requested. XML document schema is not stable yet, and an "experimental" namespace URI is used (urn:x-lyrics); we probably should define and host the schema somewhere, using that URI instead, but it's not critical for development.

defanor commented 5 years ago

Since FTS was replaced with exact matching on preprocessed text (and likely aliases in the future), we won't need the first group of parameters (controlling how to match, since it's more straightforward now). As for the third group, the output can mostly depend on query results (a listing when there are multiple matches, "no results" when there's none, showing lyrics when there's one), but probably we'll still need a parameter to request returning 404 if nothing is found, as described in #5. Maybe it should also limit query results to a single one, or return an error if there are multiple ones. Not sure yet.

I'm going to check how other lyrics search APIs work, maybe that'd give a clearer idea: some common parameters can be identified and then reused to mimic those APIs, as well as to use for regular queries.

defanor commented 5 years ago

Some websites/APIs return 404, some serve regular documents with "not found" messages in place of lyrics or elsewhere. So, to match their output, we'll need 2 parameters: a stylesheet/template to use, and whether to return 404 if nothing is found. While input can be handled with nginx (for instance) rewrites, content types can also be adjusted there when needed. Then we could both mimic other services, and tweak the parameters to alter the regular API behaviour. With tweakable XSLTs it won't be necessary to introduce a parameter governing whether to list matches or to show lyrics.

Update: Actually we already have the format parameter, could just use that for template selection. Then we'll need just one additional parameter, with 5 parameters total.

defanor commented 5 years ago

Added the errors parameter and adjusted the format one (so that stylesheets/templates can be selected with it). It should be sufficient for input part of the API for now.

As for output, there are templates to design/write/adjust, including ones for mimicking other services' APIs.

defanor commented 5 years ago

The url element is added into XML now, providing a relative reference. And it's used by the default stylesheet for listings, but maybe it should be split into separate components (for more flexible links), and there's currently no guarantee that those links will be unambiguous. In practice they should be, and we can set a UNIQUE constraint on (search_artist, search_album, search_title), but then will have to reconsider it once there will be aliases.

defanor commented 5 years ago

Regarding RDF embedding: we're focusing on lyrics (and there's mo:Lyrics), and have at least song title, album name, and artist name. Lyrics can be associated with a mo:MusicalWork, a subclass of frbr:Work. Seems to be distinct from mo:Track, which is used by both MO's XSPF RDFizer and xiph's/XSPF's XSPF.xsl, which encode similar data, except for lyrics. They also employ FOAF, which has some generic properties and can indeed easily be attached/used, though they use strings (names) in place of foaf:Agent. It doesn't seem right, even though gets used that way from time to time.

By the way, MO examples use MusicBrainz to link the artists, but MusicBrainz only embeds some metadata in ld+json, which doesn't seem to be widely supported (not supported by librdf in particular, and AFAIK it's merely RDF-compatible, not quite one of serialization formats). Perhaps wouldn't harm to link them as an alternative, if we'll be fetching links to them in the future, but not very usable or easy to link right now.

I think we'll need to properly attach artist/album/song names to lyrics, possibly in different ways (using different ontologies/relations, that is), and perhaps will have to introduce separate artist and album IRIs that would be consistent across lyrics pages/search results (so, not just #artist).

Perhaps better to focus on other interfaces for now, since it's tricky and not immediately useful.

defanor commented 5 years ago

Managed to mimic lyrics.wikia.com for Clementine with it, it's pretty easy. Maybe will prepare such XSLTs for a few more websites, and push them along with nginx configs, but mimicking other services' interfaces can be counted as ready.

Going to add a textual interface next, and then we could bikeshed XML and XHTML structures, add some light styling to the web interface, etc.

defanor commented 5 years ago

format=text gets handled now, similarly to other templates (using an XSLT). Further API adjustments shouldn't require search.pl changes, and should be achievable by tweaking the format/*.xsl files and httpd configs.