ayushjainrksh / conactivity

A tool built with Puppeteer that parses the LinkedIn profiles of a company's employees and returns the list of active employees.
MIT License
37 stars 16 forks source link

Scraping error due to query selectors #20

Closed rajkumaar23 closed 4 years ago

rajkumaar23 commented 4 years ago

Describe the bug

Active users on page 0:  []
Active users on page 1:  []
Oops! An error occured.
Error: No node found for selector: .artdeco-pagination__button.artdeco-pagination__button--next
    at Object.exports.assert (/home/rajkumar/Documents/projects/linkedin-scraper/node_modules/puppeteer/lib/cjs/puppeteer/common/assert.js:26:15)
    at DOMWorld.click (/home/rajkumar/Documents/projects/linkedin-scraper/node_modules/puppeteer/lib/cjs/puppeteer/common/DOMWorld.js:273:21)
    at processTicksAndRejections (internal/process/task_queues.js:97:5)
    at async scrapeLinkedIn (/home/rajkumar/Documents/projects/linkedin-scraper/scrape.js:149:7)

To Reproduce Steps to reproduce the behavior:

  1. Insert google in place of COMPANY in .env
  2. Run npm start
  3. Expect the error above

Expected behavior It should work as expected.

Screenshots NA

Desktop (please complete the following information):

ayushjainrksh commented 4 years ago

Thanks for filing the issue. I think there have been some UI changes in LinkedIn recently that make the script to fail for some users (It works for me fine). Also, using dark mode might be a problem. I'm looking into it but if you have any lead on why this is happening, please go ahead. It would be great if you can test if this PR works fine in your case.

ayushjainrksh commented 4 years ago

Would you mind adding a screenshot of the page where it fails(along with inspect element)? I want to have a look at the page when you click on all employees and then at the bottom there is navigation with next button.

rajkumaar23 commented 4 years ago

@ayushjainrksh Sure, will do that in the morning.

rajkumaar23 commented 4 years ago

@ayushjainrksh No, #15 doesn't solve my issue. And I'm not on dark mode either. However, I'll paste the elements below.

Company : unacademy

All employees div (have pasted only one li under the ul)

<div class="pv2 artdeco-card ph0 mb2">
<!---->                <ul class="reusable-search__entity-results-list list-style-none">
    <li id="ember360" class="reusable-search__result-container  ember-view">  

                  <div id="ember361" class="entity-result ember-view">      
                    <div id="ember362" class="ember-view"><div class="entity-result__item">
  <div class="entity-result__image">
    <div class="display-flex align-items-center">

      <div id="ember364" class="scale-down ember-view"><a data-entity-action-source="actor" data-entity-action-type="VIEW_ENTITY" href="https://www.linkedin.com/in/gauravmunjal8" id="ember365" class="app-aware-link ember-view">  
  <div id="ember366" class="ivm-image-view-model ember-view">  <div id="ember367" class="display-flex ivm-view-attr__img-wrapper ivm-view-attr__img-wrapper--use-img-tag ember-view"><!---->    <img width="48" loading="" height="48" alt="No alt text provided for this image" id="ember368" class="ivm-view-attr__img--centered EntityPhoto-circle-3  lazy-image ember-view" src="https://media-exp1.licdn.com/dms/image/C4D03AQGL06n61po19A/profile-displayphoto-shrink_100_100/0?e=1606953600&amp;v=beta&amp;t=CQobH1aOwxTyfIm8JT3YkB9dLeHrD1oPqnPaxuFUXLk">
</div>
</div>

</a></div>
    </div>
  </div>
  <div class="entity-result__content entity-result__divider pt3 pb3 t-12 t-black--light">
    <div class="mb1">
      <div class="linked-area cursor-pointer">

  <div class="t-roman t-sans">
    <span class="entity-result__title" data-entity-action-type="VIEW_ENTITY">
      <div class="display-flex">
  <span class="entity-result__title-line flex-shrink-1 entity-result__title-text--black ">
    <span class="entity-result__title-text  t-16">
      <a data-entity-action-source="actor" data-entity-action-type="VIEW_ENTITY" href="https://www.linkedin.com/in/gauravmunjal8" id="ember369" class="app-aware-link ember-view">  
        <span dir="ltr"><span aria-hidden="true"><!---->Gaurav Munjal<!----></span><span class="visually-hidden"><!---->View Gaurav Munjal’s profile<!----></span></span>

</a>
        <span class="entity-result__badge t-14 t-normal t-black--light">
          <div id="ember379" class="display-flex flex-row-reverse align-items-baseline ember-view">  <div id="ember380" class="flex-shrink-zero align-self-center mr2 entity-result__badge-icon ml1 ivm-image-view-model ember-view">  <div id="ember381" class="display-flex ivm-view-attr__img-wrapper ivm-view-attr__img-wrapper--use-img-tag ember-view">  <li-icon aria-hidden="true" type="linkedin-bug" size="14dp" color="premium"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 14 14" data-supported-dps="14x14" fill="currentColor" class="mercado-match" width="14" height="14" focusable="false">
  <g>
    <path class="background-mercado" d="M14 1v12a1 1 0 01-1 1H1a1 1 0 01-1-1V1a1 1 0 011-1h12a1 1 0 011 1zM4 5H2v7h2zm.25-2A1.27 1.27 0 003 1.8 1.27 1.27 0 001.75 3 1.27 1.27 0 003 4.2 1.27 1.27 0 004.25 3zM12 8.29c0-2.2-.73-3.49-2.86-3.49A2.71 2.71 0 006.89 6V5H5v7h2V8.73A1.74 1.74 0 018.66 6.8C9.82 6.8 10 7.94 10 8.73V12h2z"></path>
  </g>
</svg></li-icon>
</div>
</div>
  <span class="image-text-lockup__text entity-result__badge-text">
    <!---->• 2nd<!---->
  </span>
</div>
        </span>
    </span>
  </span>
    <span aria-hidden="true" class="entity-result__badge-overflow align-self-flex-end t-14 t-normal t-black--light flex-shrink-zero ">
      <div id="ember385" class="display-flex flex-row-reverse align-items-baseline ember-view">  <div id="ember386" class="flex-shrink-zero align-self-center mr2 entity-result__badge-icon ml1 ivm-image-view-model ember-view">  <div id="ember387" class="display-flex ivm-view-attr__img-wrapper ivm-view-attr__img-wrapper--use-img-tag ember-view">  <li-icon aria-hidden="true" type="linkedin-bug" size="14dp" color="premium"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 14 14" data-supported-dps="14x14" fill="currentColor" class="mercado-match" width="14" height="14" focusable="false">
  <g>
    <path class="background-mercado" d="M14 1v12a1 1 0 01-1 1H1a1 1 0 01-1-1V1a1 1 0 011-1h12a1 1 0 011 1zM4 5H2v7h2zm.25-2A1.27 1.27 0 003 1.8 1.27 1.27 0 001.75 3 1.27 1.27 0 003 4.2 1.27 1.27 0 004.25 3zM12 8.29c0-2.2-.73-3.49-2.86-3.49A2.71 2.71 0 006.89 6V5H5v7h2V8.73A1.74 1.74 0 018.66 6.8C9.82 6.8 10 7.94 10 8.73V12h2z"></path>
  </g>
</svg></li-icon>
</div>
</div>
  <span class="image-text-lockup__text entity-result__badge-text">
    <!---->• 2nd<!---->
  </span>
</div>
    </span>
</div>
    </span>
  </div>

    <div>
      <div class="entity-result__primary-subtitle t-14 t-black">
        <!---->Co-Founder and CEO at Unacademy<!---->
      </div>
        <div class="entity-result__secondary-subtitle t-14">
          <!---->Bengaluru<!---->
        </div>
    </div>

</div>

    </div>

<!---->
      <div class="entity-result__insights t-12">

    <div data-control-name="entity_result_insight1" data-control-id="An/LSn/2QsS/3ovPRY+5yw==" data-entity-action-type="SEE_MUTUAL_CONNECTIONS" data-entity-action-source="insight" class="entity-result__simple-insight " data-ember-action="" data-ember-action-399="399">
        <div id="ember401" class="ivm-image-view-model ember-view entity-result__simple-insight-image flex-shrink-zero mr1">  <div id="ember402" class="display-flex ivm-view-attr__img-wrapper ivm-view-attr__img-wrapper--use-img-tag ember-view">  <li-icon aria-hidden="true" type="people-icon" size="small"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16" data-supported-dps="16x16" fill="currentColor" class="mercado-match" width="16" height="16" focusable="false">
  <path d="M14 11.75V15H9v-3.25A1.75 1.75 0 0110.75 10h1.5A1.75 1.75 0 0114 11.75zM11.5 9A2.5 2.5 0 109 6.5 2.5 2.5 0 0011.5 9zM5 1a3 3 0 103 3 3 3 0 00-3-3zm.75 7h-1.5A2.25 2.25 0 002 10.25V15h6v-4.75A2.25 2.25 0 005.75 8z"></path>
</svg></li-icon>
</div>
</div>

      <div class="entity-result__simple-insight-text-container">
        <span class="entity-result__simple-insight-text">
          <a target="_self" href="https://www.linkedin.com/in/ACoAAAE_IMABCxvXmyezwLinNyvlx_WKa8azRys" id="ember406" class="app-aware-link ember-view"><strong><!---->Tojo Chacko<!----></strong></a><!---->,<span class="white-space-pre"> </span><a target="_self" href="https://www.linkedin.com/in/ACoAAAGa7kcB-lTUWXi9Tph7FJJVN26gv95UhNw" id="ember415" class="app-aware-link ember-view"><strong><!---->Anup Kalbalia<!----></strong></a><!---->, and<span class="white-space-pre"> </span><a target="_self" href="https://www.linkedin.com/search/results/people/?facetNetwork=%5B%22F%22%5D&amp;facetConnectionOf=%5B%22ACoAAANvJDcBW9XtB9vMkBnHzUNUi_HS0CnmMEQ%22%5D&amp;origin=SHARED_CONNECTIONS_CANNED_SEARCH" id="ember424" class="app-aware-link ember-view"><strong><!---->1 other shared connection<!----></strong></a>
        </span>
<!---->      </div>
</div>

<!---->

<!---->
      </div>
  </div>
  <div class="entity-result__actions entity-result__divider">
<!---->    <div id="ember427" class="ember-view">            <button disabled="" id="ember431" class="artdeco-button artdeco-button--muted artdeco-button--2 artdeco-button--secondary artdeco-button--disabled ember-view"><!---->
<span class="artdeco-button__text">
    Pending
</span></button>

<!---->

</div>
</div>
</div>
</div>       
</div>
</li>
</ul>
</div>

Pagination div

<div class="artdeco-card pv0 mb6">
              <div id="ember520" class="ember-view">  
      <div id="ember1020" class="artdeco-pagination ember-view pv5 ph2">  <button disabled="" aria-label="Previous" id="ember1021" class="artdeco-pagination__button artdeco-pagination__button--previous artdeco-button artdeco-button--muted artdeco-button--1 artdeco-button--tertiary artdeco-button--disabled ember-view">  <li-icon aria-hidden="true" type="chevron-left-icon" class="artdeco-button__icon" size="small"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16" data-supported-dps="16x16" fill="currentColor" class="mercado-match" width="16" height="16" focusable="false">
  <path d="M11 1L6.39 8 11 15H8.61L4 8l4.61-7z"></path>
</svg></li-icon>

<span class="artdeco-button__text">
    Previous
</span></button>

  <ul class="artdeco-pagination__pages artdeco-pagination__pages--number">
        <li data-test-pagination-page-btn="1" id="ember1023" class="artdeco-pagination__indicator artdeco-pagination__indicator--number active selected ember-view">  <button aria-current="true" aria-label="Page 1, current page">
    <span>1</span>
    <span class="a11y-text">Current page</span>
  </button>
</li>
        <li data-test-pagination-page-btn="2" id="ember1025" class="artdeco-pagination__indicator artdeco-pagination__indicator--number ember-view">  <button aria-label="Page 2" data-ember-action="" data-ember-action-1026="1026">
    <span>2</span>
  </button>
</li>
        <li data-test-pagination-page-btn="3" id="ember1028" class="artdeco-pagination__indicator artdeco-pagination__indicator--number ember-view">  <button aria-label="Page 3" data-ember-action="" data-ember-action-1029="1029">
    <span>3</span>
  </button>
</li>
        <li data-test-pagination-page-btn="4" id="ember1031" class="artdeco-pagination__indicator artdeco-pagination__indicator--number ember-view">  <button aria-label="Page 4" data-ember-action="" data-ember-action-1032="1032">
    <span>4</span>
  </button>
</li>
        <li data-test-pagination-page-btn="5" id="ember1034" class="artdeco-pagination__indicator artdeco-pagination__indicator--number ember-view">  <button aria-label="Page 5" data-ember-action="" data-ember-action-1035="1035">
    <span>5</span>
  </button>
</li>
        <li data-test-pagination-page-btn="6" id="ember1037" class="artdeco-pagination__indicator artdeco-pagination__indicator--number ember-view">  <button aria-label="Page 6" data-ember-action="" data-ember-action-1038="1038">
    <span>6</span>
  </button>
</li>
        <li data-test-pagination-page-btn="7" id="ember1040" class="artdeco-pagination__indicator artdeco-pagination__indicator--number ember-view">  <button aria-label="Page 7" data-ember-action="" data-ember-action-1041="1041">
    <span>7</span>
  </button>
</li>
        <li data-test-pagination-page-btn="8" id="ember1043" class="artdeco-pagination__indicator artdeco-pagination__indicator--number ember-view">  <button aria-label="Page 8" data-ember-action="" data-ember-action-1044="1044">
    <span>8</span>
  </button>
</li>
        <li id="ember1046" class="artdeco-pagination__indicator artdeco-pagination__indicator--number ember-view"><button aria-label="Page 9" data-ember-action="" data-ember-action-1047="1047">
  <span>…</span>
</button>
</li>
        <li data-test-pagination-page-btn="100" id="ember1049" class="artdeco-pagination__indicator artdeco-pagination__indicator--number ember-view">  <button aria-label="Page 100" data-ember-action="" data-ember-action-1050="1050">
    <span>100</span>
  </button>
</li>
  </ul>

  <button aria-label="Next" id="ember1051" class="artdeco-pagination__button artdeco-pagination__button--next artdeco-button artdeco-button--muted artdeco-button--icon-right artdeco-button--1 artdeco-button--tertiary ember-view">  <li-icon aria-hidden="true" type="chevron-right-icon" class="artdeco-button__icon" size="small"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16" data-supported-dps="16x16" fill="currentColor" class="mercado-match" width="16" height="16" focusable="false">
  <path d="M5 15l4.61-7L5 1h2.39L12 8l-4.61 7z"></path>
</svg></li-icon>

<span class="artdeco-button__text">
    Next
</span></button>
</div>

</div>
            </div>
ayushjainrksh commented 4 years ago

Thanks, I'll try to figure out what's going on here. If you can come up with a suggestion, that would be of great help. Feel free to work on this issue.

rajkumaar23 commented 4 years ago

Sure @ayushjainrksh, I’ll take a look at your code sometime tomorrow.

ayushjainrksh commented 4 years ago

Your HTML dump matches my LinkedIn's HTML from the inspect element. Moreover, everything seems to run fine on my system. As per my understanding, this can be an issue with slow internet (which I don't want to leave unfixed, the goal is to run the script even with slow internet speed). To rule out this possibility, you can try replacing...

https://github.com/ayushjainrksh/conactivity/blob/5faf168eb22b5a02a4cda4e40f8fec6a5c5cf3dc/scrape.js#L136

with

        await page.goto(profileLink + "detail/recent-activity",{
          waitUntil: ["domcontentloaded"],
        });