extractus / article-extractor

To extract main article from given URL with Node.js
https://extractor-demos.pages.dev/article-extractor
MIT License
1.6k stars 140 forks source link

Could not extract data on a page with text content longer than threshold #399

Closed JohnCido closed 1 month ago

JohnCido commented 2 months ago

The page URL is IEEE Explorer.

We have applied special filter to only pass the text-heavy elements into the parser, but I still get undefined from extractFromHtml.

Any idea of the possible reason?

ndaidong commented 2 months ago

@JohnCido did you provide that method the protected content? If possible, please share HTML you passed into extractFromHtml.

JohnCido commented 2 months ago

For some pages, we have specific rules to only pass content-related HTML into extracter.

<head></head>

<body class="body-resp cmpl_embed_complete cmpl_embed_complete_v2">
    <main _ngcontent-dql-c470="" id="xplMainContentLandmark">
        <div _ngcontent-dql-c470="" class="global-ng-wrapper"><router-outlet
                _ngcontent-dql-c470=""></router-outlet><xpl-document-details _nghost-dql-c215="">
                <div _ngcontent-dql-c215="" class="row g-0 document ng-document stats-document g-0">
                    <div _ngcontent-dql-c215="" class="document-main global-content-width-w-rr">
                        <section _ngcontent-dql-c215="" class="document-main-leaderboard-ad col-12"><xpl-leaderboard-ad
                                _ngcontent-dql-c215="" class="hide-desktop" _nghost-dql-c91="">
                                <div _ngcontent-dql-c91="" class="Ads-leaderboard ad-panel">
                                    <div _ngcontent-dql-c91="" class="row g-0 u-flex-wrap-nowrap">
                                        <div _ngcontent-dql-c91="" class="ads-close-container"><i _ngcontent-dql-c91="" aria-hidden="true"
                                                class="ads-close-button"></i></div><!---->
                                    </div>
                                    <div _ngcontent-dql-c91="" class="ad-leaderboard-ad-container"><!---->
                                        <div _ngcontent-dql-c91="" xplgoogleadmigr="" class="Ads-leaderBoardTablet">
                                            <div id="div-gpt-ad-1606861783216-0" style="width:576px;
            height:71px; display:block; margin: 0 auto; padding-bottom: 0.5em;"></div>
                                        </div><!---->
                                        <div _ngcontent-dql-c91="" xplgoogleadmigr="" class="Ads-leaderBoardMobile">
                                            <div id="div-gpt-ad-1606861783316-0" style="width:320px;
            height:50px; display:block; margin: 0 auto; padding-bottom: 0.5em;"></div>
                                        </div><!---->
                                    </div>
                                </div><!---->
                            </xpl-leaderboard-ad><!----></section><!---->
                        <section _ngcontent-dql-c215="" class="document-main-header row g-0">
                            <div _ngcontent-dql-c215="" class="col-12"><xpl-document-header _ngcontent-dql-c215=""
                                    _nghost-dql-c166="">
                                    <section _ngcontent-dql-c166="" class="document-header row g-0">
                                        <div _ngcontent-dql-c166="" class="document-header-breadcrumbs-container col-12">
                                            <div _ngcontent-dql-c166="" class="breadcrumbs col text-sm-md-lh"><span _ngcontent-dql-c166=""><a
                                                        _ngcontent-dql-c166="">Journals &amp; Magazines</a><!----><!----><span
                                                        _ngcontent-dql-c166="" class="breadcrumbs-separator"> &gt;</span></span><span
                                                    _ngcontent-dql-c166=""><a _ngcontent-dql-c166="">IEEE Transactions on
                                                        Circuits...</a><!----><!----><span _ngcontent-dql-c166="" class="breadcrumbs-separator">
                                                        &gt;</span></span><span _ngcontent-dql-c166=""><a _ngcontent-dql-c166="">Volume: 31 Issue:
                                                        3</a><!----><!----><span _ngcontent-dql-c166=""
                                                        class="breadcrumbs-separator"></span></span><!----><xpl-help-link _ngcontent-dql-c166=""
                                                    id="help" tooltiptype="breadcrumb" _nghost-dql-c60=""><a _ngcontent-dql-c60="" target="_blank"
                                                        tooltipclass="helplink-tooltip" triggers="hover" class="icon-size-md u-flex-display-inline"
                                                        aria-label="Help using Journal &amp; Magazine documents"><i _ngcontent-dql-c60=""
                                                            class="fa fa-question-circle help-link breadcrumb-help-link-icon"></i></a><!----><!----></xpl-help-link>
                                            </div>
                                        </div><!---->
                                        <div _ngcontent-dql-c166="" class="document-header-inner-container row g-0"><!---->
                                            <div _ngcontent-dql-c166="" class="col-12">
                                                <div _ngcontent-dql-c166="" class="row g-0 stats-document-header">
                                                    <div _ngcontent-dql-c166="" class="row g-0 document-title-fix"><!---->
                                                        <div _ngcontent-dql-c166="" class="document-header-title-container col">
                                                            <div _ngcontent-dql-c166="" class="left-container w-100">
                                                                <h1 _ngcontent-dql-c166="" class="document-title text-2xl-md-lh"><span
                                                                        _ngcontent-dql-c166="">Annotation and Benchmarking of a Video Dataset under Degraded
                                                                        Complex Atmospheric Conditions and Its Visibility Enhancement Analysis for Moving
                                                                        Object Detection</span></h1>
                                                                <div _ngcontent-dql-c166="" class="u-mb-1 u-mt-05 btn-container">
                                                                    <div _ngcontent-dql-c166="" class="publisher-title-tooltip"><xpl-publisher
                                                                            _ngcontent-dql-c166="" _nghost-dql-c135=""><span _ngcontent-dql-c135=""
                                                                                class="text-base-md-lh publisher-info-container black-tooltip"><span
                                                                                    _ngcontent-dql-c135="" xplhighlight=""><span _ngcontent-dql-c135=""><span
                                                                                            _ngcontent-dql-c135="" class="title">Publisher: </span><!----><span
                                                                                            _ngcontent-dql-c135="">IEEE</span></span></span><!----><!----></span><!----></xpl-publisher>
                                                                    </div><!---->
                                                                    <div _ngcontent-dql-c166=""><xpl-cite-this-modal _ngcontent-dql-c166=""
                                                                            _nghost-dql-c151=""><!---->
                                                                            <div _ngcontent-dql-c166=""></div>
                                                                        </xpl-cite-this-modal><!----></div>
                                                                    <div _ngcontent-dql-c166="" class="black-tooltip tool-tip-pdf-button">
                                                                        <div _ngcontent-dql-c166="" class="pdf-btn-container hide-mobile">
                                                                            <!----><xpl-login-modal-trigger _ngcontent-dql-c166=""
                                                                                _nghost-dql-c154=""><!----><a _ngcontent-dql-c166=""
                                                                                    title="You do not have access to this PDF"
                                                                                    class="xpl-btn-pdf doc-actions-link stats-document-lh-action-downloadPdf_3"><i
                                                                                        _ngcontent-dql-c166="" class="icon fas fa-file-pdf"></i><span
                                                                                        _ngcontent-dql-c166="">PDF</span></a><!----></xpl-login-modal-trigger><!----><!----><!----><!----><!----><!----><!---->
                                                                        </div><!---->
                                                                    </div><!---->
                                                                    <div _ngcontent-dql-c166="" class="d-flex flex-wrap"><!----><!----></div>
                                                                </div><!----><!----><!----><!----><!----><!---->
                                                            </div>
                                                            <div _ngcontent-dql-c166="" class="right-container"><!----></div>
                                                        </div>
                                                    </div><!----><!---->
                                                    <div _ngcontent-dql-c166="" class="document-main-subheader">
                                                        <div _ngcontent-dql-c166="" class="document-main-author-banner">
                                                            <div _ngcontent-dql-c166="" class="document-authors-banner stats-document-authors-banner">
                                                                <div _ngcontent-dql-c166=""
                                                                    class="d-flex authors-banner-row u-flex-align-items-center u-flex-wrap-nowrap">
                                                                    <xpl-author-banner _ngcontent-dql-c166="" class="authors-banner-row-middle"
                                                                        _nghost-dql-c159="">
                                                                        <div _ngcontent-dql-c159=""
                                                                            class="document-authors-banner stats-document-authors-banner">
                                                                            <div _ngcontent-dql-c159="" class="authors-banner-row d-flex u-flex-wrap-nowrap">
                                                                                <div _ngcontent-dql-c159="" class="authors-banner-row-middle u-pr-0">
                                                                                    <div _ngcontent-dql-c159=""
                                                                                        class="authors-container stats-document-authors-banner-authorsContainer">
                                                                                        <div _ngcontent-dql-c159=""
                                                                                            class="authors-info-container overflow-ellipsis text-base-md-lh authors-minimized"
                                                                                            id="indexTerms-container-1727233377520-0"><!----><span
                                                                                                _ngcontent-dql-c159="" class="authors-info"><span _ngcontent-dql-c159=""
                                                                                                    class="blue-tooltip"><a _ngcontent-dql-c159="" triggers="hover"><span
                                                                                                            _ngcontent-dql-c159="">Sourav Dey
                                                                                                            Roy</span><!----></a><!----><!----><!----><!----></span><!----><!----><span
                                                                                                    _ngcontent-dql-c159="" class="u-px-02"><xpl-orcid
                                                                                                        _ngcontent-dql-c159="" _nghost-dql-c59=""><a _ngcontent-dql-c59=""
                                                                                                            target="_blank"
                                                                                                            aria-label="Open Researcher and Contributor Identifier (ORCID)"><i
                                                                                                                _ngcontent-dql-c59=""
                                                                                                                class="icon icon-orcid"></i></a><!----></xpl-orcid></span><!----><span
                                                                                                    _ngcontent-dql-c159="">; </span><!----></span><span
                                                                                                _ngcontent-dql-c159="" class="authors-info"><span _ngcontent-dql-c159=""
                                                                                                    class="blue-tooltip"><a _ngcontent-dql-c159="" triggers="hover"><span
                                                                                                            _ngcontent-dql-c159="">Mrinal Kanti
                                                                                                            Bhowmik</span><!----></a><!----><!----><!----><!----></span><!----><!----><span
                                                                                                    _ngcontent-dql-c159="" class="u-px-02"><xpl-orcid
                                                                                                        _ngcontent-dql-c159="" _nghost-dql-c59=""><a _ngcontent-dql-c59=""
                                                                                                            target="_blank"
                                                                                                            aria-label="Open Researcher and Contributor Identifier (ORCID)"><i
                                                                                                                _ngcontent-dql-c59=""
                                                                                                                class="icon icon-orcid"></i></a><!----></xpl-orcid></span><!----><span
                                                                                                    _ngcontent-dql-c159=""></span><!----></span><!----></div>
                                                                                    </div>
                                                                                </div><!---->
                                                                            </div>
                                                                        </div><!---->
                                                                    </xpl-author-banner>
                                                                    <div _ngcontent-dql-c166=""
                                                                        class="u-flex-display-flex u-flex-align-items-center nowrap text-base-md-lh">
                                                                        <div _ngcontent-dql-c166="" class="authors-view-all-link-container hide-mobile"><a
                                                                                _ngcontent-dql-c166="" class="text-base-md-lh">All Authors</a></div>
                                                                        <div _ngcontent-dql-c166=""
                                                                            class="authors-mobile-view-all-container blue-tooltip hide-desktop"><a
                                                                                _ngcontent-dql-c166="" triggers="click:click" class="authors-viewall-link"><i
                                                                                    _ngcontent-dql-c166="" class="fa fa-ellipsis-h"></i></a><!----></div><!---->
                                                                    </div>
                                                                </div><!----><!---->
                                                            </div><!---->
                                                        </div>
                                                        <div _ngcontent-dql-c166="" class="document-header-metrics-banner d-flex flex-wrap">
                                                            <div _ngcontent-dql-c166="" class="document-banner col stats-document-banner">
                                                                <xpl-login-modal-trigger _ngcontent-dql-c166=""
                                                                    _nghost-dql-c154=""><!----><!----></xpl-login-modal-trigger><!----><!----><!----><!----><!----><!----><!----><!----><!---->
                                                                <div _ngcontent-dql-c166="" class="document-banner-metric-container d-flex">
                                                                    <!----><!----><!----></div><!---->
                                                                <div _ngcontent-dql-c166="" class="document-banner-access d-flex"><!----><!----></div>
                                                            </div>
                                                            <div _ngcontent-dql-c166="" class="hide-desktop d-flex"><!----><!----></div>
                                                            <div _ngcontent-dql-c166="" class="col-7-24 black-tooltip hide-mobile">
                                                                <div _ngcontent-dql-c166="" class="d-flex justify-content-end text-normal-md">
                                                                    <!----><!----><!----></div><xpl-document-toolbar _ngcontent-dql-c166=""
                                                                    _nghost-dql-c165="">
                                                                    <div _ngcontent-dql-c165=""
                                                                        class="col-actions stats-document-container-lh u-printing-invisible-ie u-printing-invisible-ff">
                                                                        <div _ngcontent-dql-c165="" class="action-item-container">
                                                                            <ul _ngcontent-dql-c165=""
                                                                                class="icon-size-md doc-actions py-3 doc-toolbar stats-document-lh-actions black-tooltip">
                                                                                <li _ngcontent-dql-c165="" class="doc-actions-item"><a _ngcontent-dql-c165=""
                                                                                        aria-label="Download References"
                                                                                        class="doc-actions-link stats_ReferencesView_Doc_Details_9082053"><i
                                                                                            _ngcontent-dql-c165=""
                                                                                            class="icon-size-md color-xplore-blue fas fa-registered"></i></a></li>
                                                                                <!---->
                                                                                <li _ngcontent-dql-c165=""
                                                                                    class="doc-actions-item white-blue-border-tooltip social-media">
                                                                                    <xpl-document-social-media _ngcontent-dql-c165=""
                                                                                        _nghost-dql-c162=""><!----><!----></xpl-document-social-media></li><!---->
                                                                                <li _ngcontent-dql-c165="" class="stats-permission doc-actions-item"><a
                                                                                        _ngcontent-dql-c165="" aria-label="Copyright request permission for reuse"
                                                                                        class="doc-actions-link stats_Doc_Details_Copyright_9082053"><i
                                                                                            _ngcontent-dql-c165=""
                                                                                            class="color-xplore-blue icon-size-md far fa-copyright"></i></a><!----><!---->
                                                                                </li><!---->
                                                                                <li _ngcontent-dql-c165=""
                                                                                    class="doc-actions-item white-blue-border-tooltip save-to disabled-look enable-hover">
                                                                                    <a _ngcontent-dql-c165="" triggers="click" class="doc-save-tool"
                                                                                        aria-label="Sign In with personal account required for save to project"><i
                                                                                            _ngcontent-dql-c165=""
                                                                                            class="icon-size-md color-xplore-blue fas fa-folder-open"></i></a><!---->
                                                                                </li><!---->
                                                                                <li _ngcontent-dql-c165="" class="doc-actions-item"><xpl-manage-alerts
                                                                                        _ngcontent-dql-c165="" class="white-blue-border-tooltip alerts-popover"
                                                                                        _nghost-dql-c164=""><!----><a _ngcontent-dql-c164="" triggers="click:click"
                                                                                            aria-label="Set Search Alert"
                                                                                            class="doc-actions-link stats-document-lh-action-alerts hide-mobile"><i
                                                                                                _ngcontent-dql-c164=""
                                                                                                class="icon-size-md color-xplore-blue fas fa-bell"></i><span
                                                                                                _ngcontent-dql-c164="" class="doc-actions-text">Alerts</span></a><!---->
                                                                                        <div _ngcontent-dql-c164=""
                                                                                            class="manage-alerts-popover-content hide-desktop">
                                                                                            <h1 _ngcontent-dql-c164="" class="header">Alerts</h1>
                                                                                            <div _ngcontent-dql-c164="" class="manage-alerts-link"><a
                                                                                                    _ngcontent-dql-c164="" aria-label="Manage Content Alerts"> Manage
                                                                                                    Content Alerts <i _ngcontent-dql-c164=""
                                                                                                        class="icon icon-courses-chevron-blue"></i></a></div>
                                                                                            <div _ngcontent-dql-c164="" class="manage-alerts-link"><a
                                                                                                    _ngcontent-dql-c164="" aria-label="Add to Citation Alerts"> Add to
                                                                                                    Citation Alerts <i _ngcontent-dql-c164=""
                                                                                                        class="icon icon-courses-chevron-blue"></i></a></div>
                                                                                        </div>
                                                                                    </xpl-manage-alerts></li><!---->
                                                                            </ul>
                                                                        </div>
                                                                    </div>
                                                                </xpl-document-toolbar><!---->
                                                            </div><!---->
                                                        </div><!---->
                                                    </div>
                                                </div><!---->
                                            </div>
                                        </div>
                                        <hr _ngcontent-dql-c166="" class="px-3 mt-2">
                                    </section><!----><!---->
                                </xpl-document-header></div><!---->
                        </section>
                        <div _ngcontent-dql-c215="" class="row g-0 document-main-body">
                            <div _ngcontent-dql-c215="" class="document-main-left-trail col-5-24">
                                <div _ngcontent-dql-c215="" class="col-24-24">
                                    <div _ngcontent-dql-c215="" class="row g-0"></div>
                                </div><!---->
                            </div>
                            <div _ngcontent-dql-c215="" class="document-main-content-container col-19-24"><xpl-left-side-bar
                                    _ngcontent-dql-c215="" _nghost-dql-c169="">
                                    <div _ngcontent-dql-c169="" xplscrollsnapmigr="" scrollreset="true" offsetfrom="100"
                                        fromelementid="mobile-tab-pane" tillelementid="full-text-footer" offsetto="-800"
                                        cssclasstostick="document-mobile-leftrail-stick"
                                        class="col-2 col-actions ng-col-actions hide-desktop stats-document-container-lh u-printing-invisible-ie u-printing-invisible-ff col-actions-mobile-closed ng-col-actions-mobile-closed">
                                        <div _ngcontent-dql-c169="" id="left-rail-container">
                                            <div _ngcontent-dql-c169="" class="doc-actions-mobile-expand-button"></div><!---->
                                            <ul _ngcontent-dql-c169="" class="doc-actions stats-document-lh-actions">
                                                <li _ngcontent-dql-c169="" class="doc-actions-item black-tooltip py-3 disabled-look">
                                                    <!----><xpl-login-modal-trigger _ngcontent-dql-c169="" _nghost-dql-c154=""><!----><a
                                                            _ngcontent-dql-c169="" title="You do not have access to this PDF"
                                                            class="doc-actions-link stats-document-lh-action-downloadPdf_3"><i _ngcontent-dql-c169=""
                                                                class="icon xpl-pdf-icon fas fa-file-pdf"></i> Download PDF
                                                        </a><!----></xpl-login-modal-trigger><!----><!----><!----></li><!---->
                                                <li _ngcontent-dql-c169="" class="doc-actions-item py-3"><a _ngcontent-dql-c169=""
                                                        class="doc-actions-link stats_ReferencesView_Doc_Details_9082053"><i _ngcontent-dql-c169=""
                                                            class="icon-size-md color-xplore-blue fas fa-registered"></i> Download References
                                                    </a><!----><!----></li>
                                                <li _ngcontent-dql-c169="" class="doc-actions-item py-3 white-blue-border-tooltip social-media">
                                                    <a _ngcontent-dql-c169="" class="doc-actions-link"><xpl-document-social-media
                                                            _ngcontent-dql-c169="" placement="document-page-mobile"
                                                            _nghost-dql-c162=""><!----><!----></xpl-document-social-media></a><!----></li>
                                                <li _ngcontent-dql-c169="" class="stats-permission doc-actions-item py-3"><a
                                                        _ngcontent-dql-c169="" title="Request permission for reuse."
                                                        aria-label="Copyright request permission for reuse"
                                                        class="doc-actions-link stats_Doc_Details_Copyright_9082053"><i _ngcontent-dql-c169=""
                                                            class="copyright-icon far fa-copyright"></i> Request Permissions </a><!----><!----></li>
                                                <li _ngcontent-dql-c169="" class="doc-actions-item py-3 disabled-look black-tooltip"><!----><a
                                                        _ngcontent-dql-c169="" triggers="click" autoclose="outside"
                                                        class="doc-actions-link stats-document-lh-action-downloadPdf_3"><i _ngcontent-dql-c169=""
                                                            class="icon-size-md color-xplore-blue fas fa-folder-open"></i> Save to
                                                    </a><!----><!----><!----></li>
                                                <li _ngcontent-dql-c169="" class="doc-actions-item py-3"><a _ngcontent-dql-c169=""
                                                        class="doc-actions-link stats-document-lh-action-alerts"><i _ngcontent-dql-c169=""
                                                            class="icon-size-md color-xplore-blue fas fa-bell"></i> Alerts </a><!----><!----><!---->
                                                </li><!---->
                                            </ul>
                                        </div>
                                    </div><!---->
                                </xpl-left-side-bar><!---->
                                <section _ngcontent-dql-c215=""
                                    class="tab-pane col-24-24 u-printing-display-inline-ie u-printing-display-inline-ff">
                                    <div _ngcontent-dql-c215="" id="mobile-tab-pane"></div><!---->
                                    <div _ngcontent-dql-c215="" class="document-main-left-trail-content"><!---->
                                        <div _ngcontent-dql-c215="" id=""><router-outlet
                                                _ngcontent-dql-c215=""></router-outlet><xpl-document-abstract _nghost-dql-c173="">
                                                <section _ngcontent-dql-c173="" class="document-abstract document-tab"><!---->
                                                    <div _ngcontent-dql-c173="" class="abstract-mobile-div hide-desktop"><!---->
                                                        <div _ngcontent-dql-c173="" class="row g-0">
                                                            <div _ngcontent-dql-c173="" class="mobile-col-12"><!----><!---->
                                                                <div _ngcontent-dql-c173="" class="u-pb-1">
                                                                    <h2 _ngcontent-dql-c173=""> Abstract:</h2><span _ngcontent-dql-c173=""
                                                                        xplmathjax="">Detection of moving objects in outdoor environments is an extremely
                                                                        researched topic. However, studies on moving object detection in complex
                                                                        atmospheric/weather condition...</span><span _ngcontent-dql-c173=""><a
                                                                            _ngcontent-dql-c173="" class="mobile-toggle-btn">View more</a></span><!---->
                                                                </div><!----><!----><!---->
                                                            </div><!---->
                                                        </div><!----><!----><!---->
                                                        <div _ngcontent-dql-c173="" tabindex="0" role="button"
                                                            class="metadata-toggle-btn mobile-content">
                                                            <h2 _ngcontent-dql-c173="" class="u-flex-display-flex"><i _ngcontent-dql-c173=""
                                                                    class="color-xplore-blue far icon-size-md u-pr-02 fa-angle-down"></i> Metadata </h2>
                                                        </div><!----><!---->
                                                    </div>
                                                    <div _ngcontent-dql-c173="" class="abstract-desktop-div hide-mobile text-base-md-lh"><!---->
                                                        <div _ngcontent-dql-c173="" class="abstract-text row g-0"><!---->
                                                            <div _ngcontent-dql-c173="" class="col-12"><!----><!---->
                                                                <div _ngcontent-dql-c173="" class="u-mb-1">
                                                                    <h2 _ngcontent-dql-c173=""> Abstract:</h2>
                                                                    <div _ngcontent-dql-c173="" xplmathjax="">Detection of moving objects in outdoor
                                                                        environments is an extremely researched topic. However, studies on moving object
                                                                        detection in complex atmospheric/weather conditions are limited, primarily because
                                                                        of the absence of any relevant benchmark dataset. To address this disparity, we
                                                                        introduce a novel benchmark video dataset entitled “Extended Tripura University
                                                                        Video Dataset (E-TUVD)” which is a diverse dataset of complex atmospheric/weather
                                                                        conditions. Currently, E-TUVD is the largest video dataset for moving object
                                                                        detection under degraded atmospheric/weather conditions. The dataset comprises 147
                                                                        video clips spanning 1-5 minutes in duration of each video clips. Because of the
                                                                        requirement of evaluating any object detection model, this study emphasizes on
                                                                        generation of ground-truth images of salient moving objects on E-TUVD. Using this
                                                                        dataset, we assessed the performance of several state-of-the-art algorithms,
                                                                        considering both the ability to detect moving objects and visibility enhancement
                                                                        under such complex conditions. The method with the best performance was used to
                                                                        investigate the effectiveness of visibility enhancement of atmospheric/weather
                                                                        degraded image sequences for accurate moving object detection. Results and analysis
                                                                        reveal that effective enhancement can significantly improve the ability of detection
                                                                        algorithms under degraded atmospheric/weather conditions to resemble the true
                                                                        properties of moving objects in terms of pixel oriented binary masks.</div>
                                                                </div><!----><!----><!---->
                                                            </div><!---->
                                                        </div><!----><!----><!----><!---->
                                                        <div _ngcontent-dql-c173="" data-tealium_data="{&quot;docType&quot;: &quot;Journal&quot;}"
                                                            class="u-pb-1 stats-document-abstract-publishedIn"><strong
                                                                _ngcontent-dql-c173="">Published in: </strong><a _ngcontent-dql-c173=""
                                                                class="stats-document-abstract-publishedIn">IEEE Transactions on Circuits and Systems
                                                                for Video Technology</a><!----><!----><span _ngcontent-dql-c173=""> ( <span
                                                                    _ngcontent-dql-c173="">Volume: 31</span><!----><span _ngcontent-dql-c173="">, <a
                                                                        _ngcontent-dql-c173="" class="stats-document-abstract-publishedIn-issue">Issue:
                                                                        3</a><!----><!----><span _ngcontent-dql-c173="">, March
                                                                        2021</span><!----></span><!---->) </span><!----><!----><!----></div>
                                                        <!----><!----><!---->
                                                        <div _ngcontent-dql-c173="" class="row g-0 u-pt-1">
                                                            <div _ngcontent-dql-c173="" class="col-6"><!---->
                                                                <div _ngcontent-dql-c173="" class="u-pb-1"><strong _ngcontent-dql-c173="">Page(s):
                                                                    </strong> 844 <span _ngcontent-dql-c173="">- 862</span><!----></div><!----><!---->
                                                                <div _ngcontent-dql-c173="" class="u-pb-1 doc-abstract-pubdate"><strong
                                                                        _ngcontent-dql-c173="">Date of Publication:</strong> 29 April 2020 <xpl-help-link
                                                                        _ngcontent-dql-c173="" arialabel="Get help with using Publication Dates"
                                                                        helplinktext="Help with using Publication Dates"
                                                                        helplink="http://ieeexplore.ieee.org/Xplorehelp/Help_Pubdates.html"
                                                                        _nghost-dql-c60=""><a _ngcontent-dql-c60="" target="_blank"
                                                                            tooltipclass="helplink-tooltip" triggers="hover"
                                                                            class="icon-size-md u-flex-display-inline"
                                                                            aria-label="Get help with using Publication Dates"><i _ngcontent-dql-c60=""
                                                                                class="fa fa-question-circle help-link help-link-icon"></i></a><!----><!----></xpl-help-link>
                                                                </div><!----><!----><!----><!----><!----><!----><!---->
                                                                <div _ngcontent-dql-c173="" class="u-pb-1">
                                                                    <div _ngcontent-dql-c173="" tabindex="0" role="button">
                                                                        <h2 _ngcontent-dql-c173="" class="u-flex-display-flex"><i _ngcontent-dql-c173=""
                                                                                class="color-xplore-blue far icon-size-md u-pr-02 fa-angle-down"></i> ISSN
                                                                            Information: </h2>
                                                                    </div><!----><!---->
                                                                </div><!----><!---->
                                                            </div>
                                                            <div _ngcontent-dql-c173="" class="col-6"><!---->
                                                                <div _ngcontent-dql-c173="" class="u-pb-1 stats-document-abstract-doi"><strong
                                                                        _ngcontent-dql-c173="">DOI: </strong><a _ngcontent-dql-c173=""
                                                                        append-to-href="?src=document"
                                                                        target="_blank">10.1109/TCSVT.2020.2991191</a><!----><!----></div>
                                                                <!----><!----><!----><!---->
                                                                <div _ngcontent-dql-c173="" class="u-pb-1 doc-abstract-publisher"><xpl-publisher
                                                                        _ngcontent-dql-c173="" _nghost-dql-c135=""><span _ngcontent-dql-c135=""
                                                                            class="text-base-md-lh publisher-info-container black-tooltip"><span
                                                                                _ngcontent-dql-c135="" xplhighlight=""><span _ngcontent-dql-c135=""><span
                                                                                        _ngcontent-dql-c135="" class="title">Publisher: </span><!----><span
                                                                                        _ngcontent-dql-c135="">IEEE</span></span></span><!----><!----></span><!----></xpl-publisher>
                                                                </div><!----><!----><!----><!---->
                                                            </div>
                                                            <div _ngcontent-dql-c173="" class="col-12 u-pb-1 stats-document-abstract-fundedBy">
                                                                <div _ngcontent-dql-c173="" tabindex="0" role="button">
                                                                    <h2 _ngcontent-dql-c173="" class="u-flex-display-flex"><i _ngcontent-dql-c173=""
                                                                            class="color-xplore-blue far icon-size-md u-pr-02 fa-angle-down"></i>Funding
                                                                        Agency: </h2>
                                                                </div><!---->
                                                            </div><!---->
                                                        </div><!----><!----><!---->
                                                    </div>
                                                </section>
                                            </xpl-document-abstract><!----></div><xpl-leaderboard-middle-ad _ngcontent-dql-c215=""
                                            class="hide-desktop" _nghost-dql-c174="">
                                            <div _ngcontent-dql-c174="" class="Ads-leaderboard ad-panel">
                                                <div _ngcontent-dql-c174="" class="row g-0 u-flex-wrap-nowrap">
                                                    <div _ngcontent-dql-c174="" class="ads-close-container"><i _ngcontent-dql-c174=""
                                                            aria-hidden="true" class="ads-close-button"></i></div><!---->
                                                </div>
                                                <div _ngcontent-dql-c174="" class="ad-leaderboard-ad-container"><!---->
                                                    <div _ngcontent-dql-c174="" xplgoogleadmigr="" class="Ads-leaderBoardMiddleTablet">
                                                        <div id="div-gpt-ad-1606861708257-0" style="width:576px;
            height:71px; display:block; margin: 0 auto; padding-bottom: 0.5em;"></div>
                                                    </div><!---->
                                                    <div _ngcontent-dql-c174="" xplgoogleadmigr="" class="Ads-leaderBoardMiddleMobile">
                                                        <div id="div-gpt-ad-1606861708357-0" style="width:320px;
            height:50px; display:block; margin: 0 auto; padding-bottom: 0.5em;"></div>
                                                    </div><!---->
                                                </div>
                                            </div><!---->
                                        </xpl-leaderboard-middle-ad><!----><xpl-document-full-text _ngcontent-dql-c215=""
                                            _nghost-dql-c185="">
                                            <section _ngcontent-dql-c185=""><!---->
                                                <div _ngcontent-dql-c185="" id="toc-wrapper" class="row g-0 full-text-toc-wrapper">
                                                    <div _ngcontent-dql-c185="" xplscrollsnapmigr="" cssclasstostick="document-toc-stick"
                                                        fromelementid="toc-wrapper" tillelementid="full-text-footer" offsetfrom="150"
                                                        offsetto="-800" scrollreset="true"
                                                        class="col-12 u-align-center ft-toc previous-next-nav-ctrl hide-desktop">
                                                        <div _ngcontent-dql-c185="" class="toc-container hide-desktop"><!----><a
                                                                _ngcontent-dql-c185="" ngclass="{'disabled': !toc}"
                                                                class="toc-link {'disabled': !toc}"><img _ngcontent-dql-c185=""
                                                                    src="/assets/img/document/toc-icon.png"> Contents </a></div><!---->
                                                    </div>
                                                </div><!---->
                                                <hr _ngcontent-dql-c185="">
                                                <div _ngcontent-dql-c185="" class="row g-0 document-full-text-content">
                                                    <div _ngcontent-dql-c185="" id="full-text-section"
                                                        class="position-relative col col-text stats-document-container-fullTextSection u-printing-display-inline-ie u-printing-display-inline-ff"
                                                        style="font-size: 15px;"><span _ngcontent-dql-c185="" id="full-text-header"></span><!---->
                                                        <div _ngcontent-dql-c185=""><!----><!----><!---->
                                                            <div _ngcontent-dql-c185="" xplmathjax="" xplfulltextdomhandler=""
                                                                parentid="full-text-section"
                                                                class="document-text hide-full-text ng-non-bindable stats-document-dynamicFullTextOrSnippet-container show-full-text snippet-text">
                                                                <div>
                                                                    <h3>I. Introduction</h3>
                                                                    <p>Moving object detection has been an active and mature research area in numerous
                                                                        computer vision applications because of the increasing demand of video surveillance
                                                                        for security applications. Fundamentally, it is often considered to be a
                                                                        pre-processing step and a low level task in computer vision applications, which is
                                                                        interconnected with high level inference tasks such as object localization,
                                                                        tracking, and classification. Its importance can be anticipated by visualizing the
                                                                        numerous articles published till date on this topic. Each moving object detection
                                                                        algorithm is designed to competently address the inherent real-world challenges of
                                                                        indoor/outdoor scenes, including illumination changes, dynamic backgrounds, ghosting
                                                                        artifacts, shadows, camouflage effects, etc. [1]–[3]. However, because outdoor
                                                                        scenes can be degraded by different complex atmospheric/weather conditions, moving
                                                                        object detection is more complicated under such conditions. Generally, the
                                                                        North-Eastern (NE) states, along with other states of India, share multiple
                                                                        international borders, because of which, security plays a vital role in such states.
                                                                        Under extreme atmospheric/weather conditions, outdoor scenes undergo from
                                                                        degradation, and suspicious intruders may not be detected by unaided human vision
                                                                        because of the high loss in contrast. Consequently, electronic surveillance plays an
                                                                        important role in detecting illegal threats to the state and for real-time detection
                                                                        of suspicious activities.</p>
                                                                </div>
                                                            </div><!----><xpl-reference-pop-up _ngcontent-dql-c185="" parentid="full-text-section"
                                                                _nghost-dql-c179=""><!----></xpl-reference-pop-up>
                                                            <div _ngcontent-dql-c185=""
                                                                class="read-more-link-container stats-document-snippetText-readMoreLink-container">
                                                                <!----><xpl-login-modal-trigger _ngcontent-dql-c185="" modalsource="document"
                                                                    _nghost-dql-c154=""><!---->
                                                                    <div _ngcontent-dql-c185="" class="button layout-btn-blue"><a _ngcontent-dql-c185=""
                                                                            attr.data-tealium_data="{&quot;readMoreLinkTo&quot;: &quot;signInPurchaseModal&quot;}"
                                                                            class="read-more-link"> Sign in to Continue Reading </a></div><!---->
                                                                </xpl-login-modal-trigger><!----><!----><!----></div><!----><span _ngcontent-dql-c185=""
                                                                id="full-text-footer"></span>
                                                        </div><!---->
                                                    </div><!----><!---->
                                                </div><!---->
                                            </section>
                                        </xpl-document-full-text><xpl-accordian-section _ngcontent-dql-c215="" _nghost-dql-c211="">
                                            <div _ngcontent-dql-c211="" role="tablist"
                                                class="document-accordion-section-container hide-mobile"><xpl-document-accordion
                                                    _ngcontent-dql-c211="" class="accordion-panel-container" _nghost-dql-c190="">
                                                    <div _ngcontent-dql-c190="" class="accordion-item">
                                                        <div _ngcontent-dql-c190="" role="tab" class="accordion-header accordion-button"
                                                            id="authors-header" aria-expanded="false" aria-disabled="false"><!---->
                                                            <div _ngcontent-dql-c190="" class="accordion-chevron ms-auto"><i _ngcontent-dql-c190=""
                                                                    class="fa fa-angle-down"></i></div><!---->
                                                        </div><!---->
                                                    </div>
                                                    <div _ngcontent-dql-c190="" class="accordion-item">
                                                        <div _ngcontent-dql-c190="" role="tab" class="accordion-header accordion-button"
                                                            id="figures-header" aria-expanded="false" aria-disabled="false"><!---->
                                                            <div _ngcontent-dql-c190="" class="accordion-chevron ms-auto"><i _ngcontent-dql-c190=""
                                                                    class="fa fa-angle-down"></i></div><!---->
                                                        </div><!---->
                                                    </div>
                                                    <div _ngcontent-dql-c190="" class="accordion-item">
                                                        <div _ngcontent-dql-c190="" role="tab" class="accordion-header accordion-button"
                                                            id="references-header" aria-expanded="false" aria-disabled="false"><!---->
                                                            <div _ngcontent-dql-c190="" class="accordion-chevron ms-auto"><i _ngcontent-dql-c190=""
                                                                    class="fa fa-angle-down"></i></div><!---->
                                                        </div><!---->
                                                    </div>
                                                    <div _ngcontent-dql-c190="" class="accordion-item">
                                                        <div _ngcontent-dql-c190="" role="tab" class="accordion-header accordion-button"
                                                            id="citations-header" aria-expanded="false" aria-disabled="false"><!---->
                                                            <div _ngcontent-dql-c190="" class="accordion-chevron ms-auto"><i _ngcontent-dql-c190=""
                                                                    class="fa fa-angle-down"></i></div><!---->
                                                        </div><!---->
                                                    </div>
                                                    <div _ngcontent-dql-c190="" class="accordion-item">
                                                        <div _ngcontent-dql-c190="" role="tab" class="accordion-header accordion-button"
                                                            id="keywords-header" aria-expanded="false" aria-disabled="false"><!---->
                                                            <div _ngcontent-dql-c190="" class="accordion-chevron ms-auto"><i _ngcontent-dql-c190=""
                                                                    class="fa fa-angle-down"></i></div><!---->
                                                        </div><!---->
                                                    </div>
                                                    <div _ngcontent-dql-c190="" class="accordion-item">
                                                        <div _ngcontent-dql-c190="" role="tab" class="accordion-header accordion-button"
                                                            id="metrics-header" aria-expanded="false" aria-disabled="false"><!---->
                                                            <div _ngcontent-dql-c190="" class="accordion-chevron ms-auto"><i _ngcontent-dql-c190=""
                                                                    class="fa fa-angle-down"></i></div><!---->
                                                        </div><!---->
                                                    </div><!---->
                                                </xpl-document-accordion></div>
                                        </xpl-accordian-section>
                                    </div><!---->
                                </section>
                            </div>
                            <div _ngcontent-dql-c215="" class="document-disqus-container col-24-24">
                                <div _ngcontent-dql-c215="" class="row g-0"><!----></div>
                            </div>
                        </div>
                    </div>
                    <div _ngcontent-dql-c215="" class="document-sidebar global-right-rail top-spacing">
                        <div _ngcontent-dql-c215="" class="header-rel-art-toggle-mobile"><i _ngcontent-dql-c215=""
                                class="header-rel-art-toggle-icon"></i></div><!---->
                        <div _ngcontent-dql-c215="" class="document-sidebar-content"><!---->
                            <div _ngcontent-dql-c215="" class="hide-mobile"><xpl-leaderboard-ad _ngcontent-dql-c215=""
                                    _nghost-dql-c91="">
                                    <div _ngcontent-dql-c91="" class="Ads-leaderboard ad-panel">
                                        <div _ngcontent-dql-c91="" class="row g-0 u-flex-wrap-nowrap"><!----></div>
                                        <div _ngcontent-dql-c91="" class="ad-leaderboard-ad-container">
                                            <div _ngcontent-dql-c91="" xplgoogleadmigr="">
                                                <div id="div-gpt-ad-1606861783116-0" style="width:300px;
            height:250px; display:block; margin: 0 auto; undefined"></div>
                                            </div><!----><!----><!---->
                                        </div>
                                    </div><!---->
                                </xpl-leaderboard-ad><!----></div>
                            <div _ngcontent-dql-c215="" class="document-sidebar-rel-art"><xpl-related-article-list
                                    _ngcontent-dql-c215="" _nghost-dql-c213="">
                                    <div _ngcontent-dql-c213="" class="stats-document-header-relatedArticles">
                                        <div _ngcontent-dql-c213="" class="header-rel-art">
                                            <div _ngcontent-dql-c213="" class="header-rel-art-title text-base-md-lh"> More Like This </div>
                                            <div _ngcontent-dql-c213="" class="header-rel-art-list">
                                                <div _ngcontent-dql-c213="" class="header-rel-art-item">
                                                    <div _ngcontent-dql-c213="" class="row g-0 text-base-md-lh"><a _ngcontent-dql-c213=""
                                                            target="_self"><span _ngcontent-dql-c213="">Dynamic Low-Light Image Enhancement for Object
                                                                Detection via End-to-End Training</span></a><!----></div>
                                                    <p _ngcontent-dql-c213="" class="header-rel-art-pub text-sm-md-lh">2020 25th International
                                                        Conference on Pattern Recognition (ICPR)</p><!---->
                                                    <p _ngcontent-dql-c213="" class="header-rel-art-pub text-sm-md-lh">Published: 2021</p>
                                                </div>
                                                <div _ngcontent-dql-c213="" class="header-rel-art-item">
                                                    <div _ngcontent-dql-c213="" class="row g-0 text-base-md-lh"><a _ngcontent-dql-c213=""
                                                            target="_self"><span _ngcontent-dql-c213="">Large-Scale Object Detection of Images from
                                                                Network Cameras in Variable Ambient Lighting Conditions</span></a><!----></div>
                                                    <p _ngcontent-dql-c213="" class="header-rel-art-pub text-sm-md-lh">2019 IEEE Conference on
                                                        Multimedia Information Processing and Retrieval (MIPR)</p><!---->
                                                    <p _ngcontent-dql-c213="" class="header-rel-art-pub text-sm-md-lh">Published: 2019</p>
                                                </div><!---->
                                            </div>
                                            <div _ngcontent-dql-c213="" class="header-rel-art-action text-base-md-lh"><a
                                                    _ngcontent-dql-c213="">Show More</a><!----><!----></div><!---->
                                        </div>
                                    </div>
                                </xpl-related-article-list></div><!---->
                            <div _ngcontent-dql-c215="" class="hide-mobile"><xpl-leaderboard-middle-ad _ngcontent-dql-c215=""
                                    _nghost-dql-c174="">
                                    <div _ngcontent-dql-c174="" class="Ads-leaderboard ad-panel">
                                        <div _ngcontent-dql-c174="" class="row g-0 u-flex-wrap-nowrap"><!----></div>
                                        <div _ngcontent-dql-c174="" class="ad-leaderboard-ad-container">
                                            <div _ngcontent-dql-c174="" xplgoogleadmigr="">
                                                <div id="div-gpt-ad-1606861708157-0" style="width:300px;
            height:600px; display:block; margin: 0 auto; undefined"></div>
                                            </div><!----><!----><!---->
                                        </div>
                                    </div><!---->
                                </xpl-leaderboard-middle-ad><!----></div>
                        </div>
                    </div><xpl-reference-panel _ngcontent-dql-c215="" _nghost-dql-c214="">
                        <section _ngcontent-dql-c214="" id="references-anchor"
                            class="document-all-references hide-mobile panel-closed">
                            <div _ngcontent-dql-c214="" class="header">
                                <h1 _ngcontent-dql-c214="">References</h1><a _ngcontent-dql-c214=""><i _ngcontent-dql-c214=""
                                        class="fas fa-times"></i></a>
                            </div>
                            <div _ngcontent-dql-c214="" id="references-section-container" class="document-ft-section-container">
                                <div _ngcontent-dql-c214=""><b _ngcontent-dql-c214="">References is not available for this document.</b>
                                </div><!----><!----><!----><!---->
                            </div>
                        </section>
                    </xpl-reference-panel><!---->
                </div><!---->
            </xpl-document-details><!----></div>
    </main>
</body>
<div>
    <div style="position: fixed; top: 0px; left: 0px; width: 100dvw; height: 100dvh; pointer-events: none;"></div>
</div>
<div style="z-index: 2147483647; position: fixed; overflow: visible; width: auto; top: 0px; left: 0px;">
    <div data-dashlane-shadowhost="true"></div>
</div>
ndaidong commented 2 months ago

If you pass the above HTML to extractFromHtml(), it will fail because the required properties such as title and url are impossible to detect.

It expects HTML with standard structure, a title and some meta tags, e.g:

<html>
  <head>
    <meta charset="utf-8" />
    <meta property="og:url" content="{a valid url here}" />
    <title>any title string</title>
  </head>
  <body>
    // put the above content into here
  </body>
</html>

You can add a url into the above HTML as meta tag, or specify via the second parameter of extractFromHtml():

const article = await extractFromHtml(ABOVE_HTML, 'https://ieeexplore.ieee.org/abstract/document/9082053')
console.log(article)

It may return some like below:

{
  url: 'https://ieeexplore.ieee.org/abstract/document/9082053',
  title: 'any title string',
  description: 'Download PDF Download References Request Permissions  Save to...',
  links: [ 'https://ieeexplore.ieee.org/abstract/document/9082053' ],
  image: '',
  content: '<div>\n' +
    '\t\t\t\t\t\t\t\t\t<div>\n' +
    '\t\t\t\t\t\t\t\t\t\t\t<ul>\n' +
    '\t\t\t\t\t\t\t\t\t\t\t\t<li>\n' +
    '\t\t\t\t\t\t\t\t\t\t\t\t\t<a title="You do not have access to this PDF"><i></i> Download PDF\n' +
    '\t\t\t\t\t\t\t\t\t\t\t\t\t\t</a></li>\n' +
    '\t\t\t\t\t\t\t\t\t\t\t\t<li><a><i></i> Download References\n' +
    '\t\t\t\t\t\t\t\t\t\t\t\t\t</a></li>\n' +
    '\t\t\t\t\t\t\t\t\t\t\t\t<li><a title="Request permission for reuse."><i></i> Request Permissions </a></li>\n' +
    '\t\t\t\t\t\t\t\t\t\t\t\t<li><a><i></i> Save to\n' +
    '\t\t\t\t\t\t\t\t\t\t\t\t\t</a></li>\n' +
    '\t\t\t\t\t\t\t\t\t\t\t\t<li><a><i></i> Alerts </a>\n' +
    '\t\t\t\t\t\t\t\t\t\t\t\t</li>\n' +
    '\t\t\t\t\t\t\t\t\t\t\t</ul>\n' +
    '\t\t\t\t\t\t\t\t\t\t</div>\n' +
    '\t\t\t\t\t\t\t\t<div>\n' +
    '\t\t\t\t\t\t\t\t\t\t<div>\n' +
    '\t\t\t\t\t\t\t\t\t\t\t\t<section>\n' +
    '\t\t\t\t\t\t\t\t\t\t\t\t\t<div>\n' +
    '\t\t\t\t\t\t\t\t\t\t\t\t\t\t<div>\n' +
    '\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<h2> Abstract:</h2><p><span>Detection of moving objects in outdoor environments is an extremely\n' +
    '\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tresearched topic. However, studies on moving object detection in complex\n' +
    '\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tatmospheric/weather condition...</span><span><a>View more</a></span></p>\n' +
    '\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t</div>\n' +
//... more content here
JohnCido commented 1 month ago

Sorry for the late response. This webpage is missing <title> element in the head. Once I add it, the parser works. @ndaidong Thanks for the help!