Open nikita-volkov opened 11 years ago
Just noticed that it isn't stably reproducable
Can you give me an example where this issue occurred? With my test file in tests/test.html, the selector works just fine.
Here's an example of multiple correct selectors which produce no results:
main = do
print $ runLA (hread >>> tags //> getText) $ html
where
tags = css "div .fl_r" <+> css ".info .audio_add_wrap" <+> css ".fl_l .fl_l" <+> css "div.title_wrap"
html = "<div class=\"audio fl_l\" id=\"audio126257070_150084772\" onmouseover=\"addClass(this, 'over');\" onmouseout=\"removeClass(this, 'over');\"> <a name=\"126257070_150084772\"></a> <div class=\"area clear_fix\" onclick=\"if (cur.cancelClick){ cur.cancelClick = false; return false;} playAudioNew('126257070_150084772')\"> <div class=\"play_btn fl_l\"> <div class=\"play_btn_wrap\"><div class=\"play_new\" id=\"play126257070_150084772\"></div></div> <input type=\"hidden\" id=\"audio_info126257070_150084772\" value=\"http://cs1-4.userapi.com/d33/859470469ef948.mp3,221\" /> </div> <div class=\"info fl_l\"> <div class=\"title_wrap fl_l\" onmouseover=\"setTitle(this);\"><b><a href=\"/search?c[q]=Michel%20Tel%5C%F3&c[section]=audio\" onclick=\"if (checkEvent(event)) { event.cancelBubble = true; return}; Audio.selectPerformer(event, 'Michel Tel\ó'); return false\">Michel Teló</a></b> – <span class=\"title\"><a href=\"\" onclick=\"Audio.showLyrics('126257070_150084772',24208242,1); return cancelEvent(event);\">Bara Bará Bere Berê</a> </span><span class=\"user\" onclick=\"event.cancelBubble = true;\"></span></div> <div class=\"actions\"> <div class=\"audio_add_wrap fl_r\" onmouseover=\"Audio.rowActive(this, 'Добавить в мои аудиозаписи', [9, 5, 0]);\" onmouseout=\"Audio.rowInactive(this);\" onclick=\"Audio.addShareAudio(this, 150084772, 126257070, 'a74352b70e39439b99', 0, 1); return cancelEvent(event);\"> <div class=\"audio_add\"></div></div> </div> <div class=\"duration fl_r\">3:41</div> </div> </div> <div id=\"lyrics126257070_150084772\" class=\"lyrics\" nosorthandle=\"1\"></div></div>"
Here's what I get:
[" ","3:41"," ","3:41"," ","3:41"," "," "," "," "," "," ","Michel Tel"," "," ","Bara Bar"," Bere Ber"," "," "," "," "," "," ","3:41"," ","Michel Tel"," "," ","Bara Bar"," Bere Ber"," ","Michel Tel"," "," ","Bara Bar"," Bere Ber"," ","Michel Tel"," "," ","Bara Bar"," Bere Ber"," "]
What versions of GHC and HXT do you have, and what platform are you on? Does upgrading ghc / hxt fix the issue?
Strange it is stably reproducable on my two machines: OSX 10.8.2 with GHC 7.4.2, HXT 9.3.0.1, upgraded to 9.3.1.1. Ubuntu 12.10 (64-bit, AMD processor) with GHC 7.4.2, HXT 9.3.1.1.
Just in case, the import statements are:
import Text.XML.HXT.Core
import Text.HandsomeSoup
Could it probably be a locale/UTF/special symbols related issue?
I must note that most other selectors work fine.
Does it work for you with the special symbols removed? Or does the following work for you?
runX $ parseHtml html >>> multi (hasAttrValue "class" (elem "info" . words)) //> getText
That's the equivalent translation to pure HXT.
And what happens when you add a multi
in front of tags?
print $ runLA (hread >>> multi tags //> getText) $ html
I would expect this to give you duplicated text.
Adding multi
in front of tags
still produces an empty list.
Clearing the HTML from ampersands does not help.
Concerning your second question, I think you've presented a selector for css ".info"
, which works fine for me. The one not working is css ".info .audio_add_wrap"
, and I've tried the following HXT
translation of it and it worked fine
runX $ parseHtml html >>> multi (hasAttrValue "class" (elem "info" . words)) >>> multi (hasAttrValue "class" (elem "audio_add_wrap" . words)) //> getText
So I guess the problem is still somewhere in HandsomeSoup
I think I'm having similar problems, but with hyphens.
This doesn't work
links <- runX $ doc >>> css ".item-result"
but this works:
links <- runX $ doc >>> css "div" >>> hasAttrValue "class" (=="item-result")
And this is the html source I'm parsing: http://pastebin.com/HNmRvFC6
I'll take a look.
@pooster: are you running version 0.3.2 or the latest hackage 0.3.1. 0.3.2 fixed a few bugs that look similar to yours.
@pooster your example works for me with version 0.3.2 (it doesn't work with version 0.3.1).
@nikita-volkov it's been a while but could you verify that you're still having this issue. I don't see it.
In a couple of days. Yes. 17.08.2013 7:56 ÐÏÌØÚÏ×ÁÔÅÌØ "Aditya Bhargava" notifications@github.com ÎÁÐÉÓÁÌ:
@nikita-volkov https://github.com/nikita-volkov it's been a while but could you verify that you're still having this issue. I don't see it.
Reply to this email directly or view it on GitHubhttps://github.com/egonSchiele/HandsomeSoup/issues/8#issuecomment-22805069 .
Queries like
css ".some_class"
produce no results when in fact there are elements with that class