cheeriojs / cheerio

The fast, flexible, and elegant library for parsing and manipulating HTML and XML.
https://cheerio.js.org
MIT License
28.44k stars 1.64k forks source link

fails to parse input file with same name #729

Closed k1ng440 closed 3 years ago

k1ng440 commented 9 years ago

I am having issue with this page because its has multiple button named commit in <script> tags

$("[name=commit]")

<!DOCTYPE html>

<html lang="en">
<head>
    <meta charset="utf-8">
    <meta content="IE=edge,chrome=1" http-equiv="X-UA-Compatible">

    <title>Supreme: ”cherry” - DVD</title>
    <meta content="Supreme. The official website of Supreme. EST 1994. NYC." name="description">
    <meta content="telephone=no" name="format-detection">
    <meta content="on" http-equiv="cleartype">
    <meta content="app-id=664573705" name="apple-itunes-app">
    <link href="//www.google-analytics.com" rel="dns-prefetch">
    <link href="//ssl.google-analytics.com" rel="dns-prefetch">
    <link href="//d2flb1n945r21v.cloudfront.net" rel="dns-prefetch">
    <link href="//d17ol771963kd3.cloudfront.net/assets/application-693f5e35d53be857e7f0e147a14a8476.css" media="all" rel="stylesheet" type="text/css">
    <meta content="authenticity_token" name="csrf-param">
    <meta content="V5aOYu1mUMJnA5lku2ucfM1AGsh1i8UD2wFpmEaNf4A=" name="csrf-token">
    <script type="text/javascript">
document.write('<link href="//d17ol771963kd3.cloudfront.net/assets/styles-js-b76a6528e47ae2339c3dbf6b1f8556b5.css" rel="stylesheet" type="text/css" />');var realNycOffset = -14400;
    </script>
    <script src="//d17ol771963kd3.cloudfront.net/assets/application-f35f0273dcd7aff006d05a543511d402.js" type="text/javascript"></script>
</head>

<body class=" products show japan">
    <header id="header">
        <hgroup>
            <h1 class="logo"><a href="/">Supreme</a></h1><time data-timezone-offset="32400"><b>06/30/2015 07:27am</b> <span id="time-zone-name">TYO</span></time>
        </hgroup>
    </header>

    <div id="wrap">
        <div id="container">
            <div class="sidebar">
                <noscript>
                <div id="cart-view">
                    <a class="button edit" href="/shop/cart">カート内容のご確認</a>
                </div></noscript>

                <div class="hidden" id="cart">
                    <ul>
                        <li class="num"><i id="items-count"></i> <span class="in_cart_text">in cart</span></li>

                        <li class="subtotal-container"><b>小計</b>&nbsp;<i id="subtotal"></i></li>
                    </ul><a class="button edit" href="/shop/cart">カート内容のご確認</a><a class="button checkout" href="https://www.supremenewyork.com/checkout">ご注文手続きへ</a>
                </div>
            </div>

            <article>
                <figure>
                    <img alt="”cherry”" id="img-main" src="//d17ol771963kd3.cloudfront.net/90308/ma/5ntFf9b3yC8.jpg">

                    <div id="zoom-lens"></div>

                    <div data-background-image="//d17ol771963kd3.cloudfront.net/90308/zo/5ntFf9b3yC8.jpg" id="zoom-holder" style="background-image: url('//d17ol771963kd3.cloudfront.net/90308/zo/5ntFf9b3yC8.jpg'); background-position: 0 0; background-repeat: no-repeat;"></div>
                </figure>
            </article>

            <div data-style-limited="true" data-style-limited-with-count="0" id="details">
                <h1>”cherry”</h1>

                <p class="style">DVD</p>

                <p class="description">"cherry" is directed by New York-based videographer, William Strobeck. The video features Tyshawn Jones, Sage Elsesser, Sean Pablo, Na-kel Smith, Kevin Bradley, Aidan Mackey, Paulo Diaz, Mark Gonzales, Dylan Rieder, Alex Olson and Jason Dill. "cherry" runs 38 minutes. Photo book included.</p>

                <ul class="styles">
                    <li>
                        <a class="selected" data-images="{&quot;detail_url&quot;:&quot;//d17ol771963kd3.cloudfront.net/90308/ma/5ntFf9b3yC8.jpg&quot;,&quot;zoomed_url&quot;:&quot;//d17ol771963kd3.cloudfront.net/90308/zo/5ntFf9b3yC8.jpg&quot;}" data-sold-out="false" data-style-id="8651" data-style-name="DVD" href="/shop/accessories/cherry/dvd"><img alt="DVD" height="32" src="//d17ol771963kd3.cloudfront.net/90308/sw/5ntFf9b3yC8.jpg" width="32"></a><a class="" data-images="{&quot;detail_url&quot;:&quot;//d17ol771963kd3.cloudfront.net/90311/ma/dCWEzgwWXDc.jpg&quot;,&quot;zoomed_url&quot;:&quot;//d17ol771963kd3.cloudfront.net/90311/zo/dCWEzgwWXDc.jpg&quot;}" data-sold-out="false" data-style-id="8651" data-style-name="DVD" href="/shop/accessories/cherry/dvd?alt=0"><img alt="DVD" height="32" src="//d17ol771963kd3.cloudfront.net/90311/sw/dCWEzgwWXDc.jpg" width="32"></a>
                    </li>
                </ul>

                <p class="price"><span>¥3,240</span></p>

                <div id="cart-controls">
                    <form accept-charset="UTF-8" action="/shop/2471/add" class="add" data-remote="true" id="cart-addf" method="post" name="cart-addf">
                        <div style="margin:0;padding:0;display:inline">
                            <input name="utf8" type="hidden" value="&#x2713;"><input name="authenticity_token" type="hidden" value="V5aOYu1mUMJnA5lku2ucfM1AGsh1i8UD2wFpmEaNf4A=">
                        </div>

                        <fieldset>
                            <input id="size" name="size" type="hidden" value="20218"><a class="next" href="/shop/accessories/supreme-braun-travel-alarm-clock">next accessory &gt;</a>
                        </fieldset>

                        <fieldset id="add-remove-buttons">
                            <input class="button" name="commit" type="submit" value="カートに入れる"><a class="button continue" href="/shop">買い物を続ける</a>
                        </fieldset>
                    </form>
                </div><script id="cart-controls-add" type="text/x-nano-tmpl">
<form accept-charset="UTF-8" action="/shop/2471/add" class="add" data-remote="true" id="cart-addf" method="post"><div style="margin:0;padding:0;display:inline"><input name="utf8" type="hidden" value="&#x2713;" /><input name="authenticity_token" type="hidden" value="V5aOYu1mUMJnA5lku2ucfM1AGsh1i8UD2wFpmEaNf4A=" /></div><fieldset><input id="size" name="size" type="hidden" value="20218" /><a href="/shop/accessories/supreme-braun-travel-alarm-clock" class="next">next accessory &gt;</a></fieldset><fieldset id="add-remove-buttons"><input class="button" name="commit" type="submit" value="カートに入れる" /><a href="/shop" class="button continue">買い物を続ける</a></fieldset></form>
                </script><script id="cart-controls-remove" type="text/x-nano-tmpl">
<form accept-charset="UTF-8" action="/shop/2471/remove" class="delete" data-remote="true" id="cart-remove" method="post"><div style="margin:0;padding:0;display:inline"><input name="utf8" type="hidden" value="&#x2713;" /><input name="_method" type="hidden" value="delete" /><input name="authenticity_token" type="hidden" value="V5aOYu1mUMJnA5lku2ucfM1AGsh1i8UD2wFpmEaNf4A=" /></div><input id="size" name="size" type="hidden" value="{size_id}" /><fieldset><b class="button in-cart">in cart</b><a href="/shop/accessories/supreme-braun-travel-alarm-clock" class="next">next accessory &gt;</a></fieldset><fieldset id="add-remove-buttons"><input class="button remove" name="commit" type="submit" value="削除" /><a href="/shop" class="button continue">買い物を続ける</a></fieldset></form>
                </script><script id="cart-controls-sold-out" type="text/x-nano-tmpl">
<form><fieldset><a href="/shop/accessories/supreme-braun-travel-alarm-clock" class="next">next accessory &gt;</a></fieldset><fieldset id="add-remove-buttons"><b class="button sold-out">sold out</b><a href="/shop" class="button continue">買い物を続ける</a></fieldset></form>
                </script><script id="cart-controls-limited" type="text/x-nano-tmpl">
<form><fieldset><b class="warning">お一人様一点とさせていただきます。</b></fieldset><fieldset><b class="button disabled">カートに入れる</b><a href="/shop" class="button continue">買い物を続ける</a></fieldset></form>
                </script>
            </div>
        </div>
    </div>

    <footer id="nav">
        <nav>
            <ul id="nav-exit">
                <li>
                    <a href="http://www.supremenewyork.com">home</a>
                </li>

                <li>&nbsp;&gt; <a href="http://www.supremenewyork.com/shop">shop</a>
                </li>
            </ul>

            <ul id="nav-store">
                <li>
                    <a href="http://www.supremenewyork.com/shop/all">すべて表示</a>
                </li>

                <li>
                    <a href="http://www.supremenewyork.com/shop/sizing">サイズ/製品寸法</a>
                </li>

                <li>
                    <a href="http://www.supremenewyork.com/shop/terms">利用規約</a>
                </li>

                <li>
                    <a href="http://www.supremenewyork.com/shop/policy">特定商取引法表示</a>
                </li>
            </ul>
        </nav>
    </footer><script type="text/javascript">
if (typeof(fb_param) == "undefined") {
    var fb_param = {};
    fb_param.pixel_id = '6011891039171';
    fb_param.value = '0.00';
    fb_param.currency = 'USD';
    (function(){
    var fpw = document.createElement('script');
    fpw.async = true;
    fpw.src = '//connect.facebook.net/en_US/fp.js';
    var ref = document.getElementsByTagName('script')[0];
    ref.parentNode.insertBefore(fpw, ref);
    })();
    }
    </script><script type="text/javascript">
if (!window._gaq) {
    var _gaq = _gaq || [];
    _gaq.push(['_setAccount', "UA-104557-13"]);
    _gaq.push(['_trackPageview']);

    (function() {
    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
    if($("script[src='"+ga.src+"']").size() == 0){
      var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
    }
    })();
    } else {
    ga_track('pageview');
    }
    </script>
</body>
</html>
pyhedgehog commented 9 years ago

What result you've got and what result you expect?

k1ng440 commented 9 years ago

i expect to get 1 with $("[name=commit]").length; but got 0

pyhedgehog commented 9 years ago
var cheerio = require('cheerio');
var $ = cheerio.load(require('fs').readFileSync('cheerio_issue729.html','utf-8'));
console.log($("[name=commit]").length);
console.log($("[href='/shop/accessories/supreme-braun-travel-alarm-clock']").length);

Where cheerio_issue729.html is your html. Output:

1
1

I have cheerio@0.19.0. What version you are testing?

k1ng440 commented 9 years ago

strange i am also using the same version

"cheerio": "^0.19.0"

pyhedgehog commented 9 years ago

Re-check data. Maybe in your script download of html fails?

jugglinmike commented 9 years ago

The best thing you can do is to create a standalone JavaScript file that demonstrates this problem, HTML and all. It's a little tedious to write markup in a string literal in JavaScript, but it's the most reliable way to ensure we're not dealing with data issues like the one @pyhedgehog has mentioned. If in the process, you can eliminate any markup that is not necessary to trigger the bug, that would help us, too!