FCR001 / cantata

Automatically exported from code.google.com/p/cantata
GNU General Public License v3.0
0 stars 0 forks source link

Lyric search fails on special characters #339

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
As the summary/title says. Searching for lyrics on a song with a special 
character like "//", "?", "&", etc will fail the search. Due to this, the lyric 
search cannot find the lyric page.

Original issue reported on code.google.com by fearedta...@gmail.com on 26 Nov 2013 at 12:55

GoogleCodeExporter commented 9 years ago
Hmmm... Some are handled, but not all it seems. Anyway, I've made some changes 
that seem to fix this. Can you checkout the trunk version of Cantata, and 
confirm the fix?

If not, which lyrics are not being found - as I can test by doing a manual 
search.

Original comment by craig.p....@gmail.com on 26 Nov 2013 at 8:03

GoogleCodeExporter commented 9 years ago
I've just checked out the trunk, compiled, and ran it. Says version 1.1.51 in 
Help -> About Cantata.

3 A.M. Spiritual by Smith Westerns on the album Soft Will is not picking up 
lyrics for example. I've tried searching for "3am spiritual", "3 a m 
spiritual", and "3 am spiritual" but it refuses to pick them up. The lyrics 
exist on azlyrics, songlyrics, and rapgenius. Honestly, though I have to wonder 
if this is a "special character" issue or if all the lyric search options don't 
actually work. I've just noticed that if the lyrics don't exist in 
lyrics.wikia.com they simply will not be found at all, even if they exist in 
the other websites listed in the program. It simply refuses to find anything on 
the album Soft Will. This is not the only artist/album I am having this problem 
with, though. Even with some more popular artists like Lady GaGa's new album 
ARTPOP, I was having problems with receiving the lyrics for "Manicure".

I appreciate the response.

Original comment by fearedta...@gmail.com on 26 Nov 2013 at 11:28

GoogleCodeExporter commented 9 years ago
I must admit that I actually "borrowed" the lyrics code from Clementine. 
Looking at the code, it looks as if the parsing of the azlyrics.com response is 
no longer valid - I've now updated this (and songlyrics). Probably more sites 
need updating...

Please update, and try again :-)

Also, to see what URLs Cantata is sending, if you start Cantata from the 
commandline as follows:

 CANTATA_DEBUG=-4096 cantata

...then Cantata will display all HTTP urls, and their response code. (If 4096 
is used, then the output is logged to ~/.cache/cantata/cantata.log)

If you see a response such as:

"http://www.songlyrics.com/katy-perry/roar-lyrics/" 0 "OK"

...then the lyrics were found at the site. If the lyrics are still not 
displayed, then it is the 'scraping' of the response that is failing - which 
was the case for azlyrics and songlyrics.

If you want to help with fixing these, then the 
'context/ultimate_providers.xml' file contains details of the tags that are 
looked for to mark the start and end of the lyrics in the response. This file 
is XML, hence certain characters are encoded - e.g. "<" instead of "<"

If you type the OK URL above into a browser, you will see the sites response. 
Pressing Ctrl-U will show the HTML code - and should give you an idea of what 
start/end strings to look for.

...if the above is too much, or you dont understand it, then it'd still help if 
you could test all sites - and let me know which ones are failing.

Original comment by craig.p....@gmail.com on 27 Nov 2013 at 12:36

GoogleCodeExporter commented 9 years ago
Your latest update fixes some of the lyric searches. Much appreciated. After 
running Cantata from command line with your arguments I definitely see a few 
servers that are not being reached. I'll take you up on your offer and take a 
stab at trying to fix a few of these.

Thanks for the help and fixes.

Original comment by fearedta...@gmail.com on 27 Nov 2013 at 12:59

GoogleCodeExporter commented 9 years ago
I've added fixes for darklyrics and directlyrics (but some characters in 
response need fixing)

I've also added more debug.  Now if you start Cantata as:

    CANTATA_DEBUG=-12288 ./cantata

It will log network access and lyrics parsing. Use -8192 for just the lyrics 
side.

Original comment by craig.p....@gmail.com on 27 Nov 2013 at 9:08

GoogleCodeExporter commented 9 years ago
I've commited more changes, that should fix *most* of the providers.

lyricsbay.com, lyricsdownload.com, lyricsmania.com, and teksty.org still have 
issues. But I'm not sure I can do much about these...

Original comment by craig.p....@gmail.com on 28 Nov 2013 at 7:59

GoogleCodeExporter commented 9 years ago
I've fixed the directlyrics issue - encoding was set to iso8859-1, but utf-8 is 
better.

Anyway, I'm marking this as fixed - as most sites seem to work for me now.

Original comment by craig.p....@gmail.com on 1 Dec 2013 at 6:05