gigablast / open-source-search-engine

Nov 20 2017 -- A distributed open source search engine and spider/crawler written in C/C++ for Linux on Intel/AMD. From gigablast dot com, which has binaries for download. See the README.md file at the very bottom of this page for instructions.
Apache License 2.0
1.54k stars 439 forks source link

Segfault when adding URL #172

Open onlyjob opened 3 years ago

onlyjob commented 3 years ago

In http://localhost:8000/addurl?c=main adding https://searchengine.party/ and clicking "Go" result in immediate crash:

Thread 1 "gb" received signal SIGSEGV, Segmentation fault.
Msg25::sendRequests (this=0x55555b0f66d0) at Linkdb.cpp:1515
1515                            m_k = m_oldLinkInfo->getNextInlink ( m_k );

(gdb) bt
#0  Msg25::sendRequests (this=0x55555b0f66d0) at Linkdb.cpp:1515
#1  0x0000555555801b16 in Msg25::doReadLoop (this=this@entry=0x55555b0f66d0) at Linkdb.cpp:1212
#2  0x00005555558046e1 in Msg25::getLinkInfo2 (this=this@entry=0x55555b0f66d0, site=0x55555b0f5593 "www.searchengine.party", url=0x55555b0f557c "www.searchengine.party",
    isSiteLinkInfo=isSiteLinkInfo@entry=true, ip=1340591670, docId=<optimized out>, collnum=0, qbuf=0x0, qbufSize=0, state=0x55555b0f66d0,
    callback=0x5555557fe7e0 <sendReplyWrapper(void*)>, isInjecting=false, printDebugMsgs=false, printInXml=false, siteNumInlinks=0, oldLinkInfo=0x0, niceness=1,
    doLinkSpamCheck=true, oneVotePerIpDom=<optimized out>, canBeCancelled=true, lastUpdateTime=1620219098, onlyNeedGoodInlinks=<optimized out>, getLinkerTitles=false,
    ourHostHash32=0, ourDomHash32=0, linkInfoBuf=0x55555b0f66e0) at Linkdb.cpp:1079
#3  0x00005555558048f7 in handleRequest25 (slot=0x7fff71d10f34, netnice=<optimized out>) at Linkdb.cpp:800
#4  0x00005555556bb320 in UdpServer::makeCallback_ass (this=this@entry=0x55555600a5a0 <g_udpServer>, slot=slot@entry=0x7fff71d10f34) at UdpServer.cpp:2703
#5  0x00005555556bbaaa in UdpServer::makeCallbacks_ass (this=this@entry=0x55555600a5a0 <g_udpServer>, niceness=niceness@entry=1) at UdpServer.cpp:2167
#6  0x00005555556bc05a in UdpServer::makeCallbacks_ass (niceness=1, this=0x55555600a5a0 <g_udpServer>) at UdpServer.cpp:1952
#7  UdpServer::process_ass (this=0x55555600a5a0 <g_udpServer>, now=1620219098501, maxNiceness=100) at UdpServer.cpp:1168
#8  0x0000555555736d52 in Loop::callCallbacks_ass (this=this@entry=0x55555609b160 <g_loop>, forReading=forReading@entry=true, fd=fd@entry=3, now=1620219098501,
    niceness=niceness@entry=0) at Loop.cpp:536
#9  0x0000555555737a8d in Loop::doPoll (this=this@entry=0x55555609b160 <g_loop>) at Loop.cpp:2133
#10 0x000055555573803f in Loop::runLoop (this=0x55555609b160 <g_loop>) at Loop.cpp:1425
#11 0x00005555555e9982 in main2 (argc=1, argv=0x7fffffffdcb8) at main.cpp:4091
#12 0x00005555555e2e35 in main (argc=1, argv=0x7fffffffdcb8) at main.cpp:399

#0  Msg25::sendRequests (this=0x55555b0f66d0) at Linkdb.cpp:1515
        itop = 1527745596
        ip32 = 21845
        docId = <optimized out>
        discovered = 0
        lostDate = 0
        status = <optimized out>
        isLinkSpam = 0 '\000'
        j = <optimized out>
        r = <optimized out>
        lastDocId = 0
        ratio = <optimized out>
        ourMax = 2
        cr = <optimized out>
        __PRETTY_FUNCTION__ = "bool Msg25::sendRequests()"
#1  0x0000555555801b16 in Msg25::doReadLoop (this=this@entry=0x55555b0f66d0) at Linkdb.cpp:1212
        startKey = {n0 = 0, n1 = 0, n2 = 0, n3 = 18175714968563875840}
        endKey = {n0 = 4294967293, n1 = 18446744060824518655, n2 = 18446744073709551615, n3 = 18175714972858843135}
        siteHash32 = <optimized out>
        __PRETTY_FUNCTION__ = "bool Msg25::doReadLoop()"
        numFiles = -1
        includeTree = true
        cr = <optimized out>
        xx = <optimized out>
#2  0x00005555558046e1 in Msg25::getLinkInfo2 (this=this@entry=0x55555b0f66d0, site=0x55555b0f5593 "www.searchengine.party", url=0x55555b0f557c "www.searchengine.party", 
    isSiteLinkInfo=isSiteLinkInfo@entry=true, ip=1340591670, docId=<optimized out>, collnum=0, qbuf=0x0, qbufSize=0, state=0x55555b0f66d0, 
    callback=0x5555557fe7e0 <sendReplyWrapper(void*)>, isInjecting=false, printDebugMsgs=false, printInXml=false, siteNumInlinks=0, oldLinkInfo=0x0, niceness=1, 
    doLinkSpamCheck=true, oneVotePerIpDom=<optimized out>, canBeCancelled=true, lastUpdateTime=1620219098, onlyNeedGoodInlinks=<optimized out>, getLinkerTitles=false, 
    ourHostHash32=0, ourDomHash32=0, linkInfoBuf=0x55555b0f66e0) at Linkdb.cpp:1079
        cr = <optimized out>
        __PRETTY_FUNCTION__ = "bool Msg25::getLinkInfo2(char*, char*, bool, int32_t, int64_t, int16_t, char*, int32_t, void*, void (*)(void*), bool, bool, bool, int32_t, LinkInfo*, int32_t, bool, bool, bool, int32_t, bool, bool, in"...
        u = {m_url = "http://www.searchengine.party/", '\000' <repeats 135 times>, "\377\000\000\000\000\000\377\000\000\000\000rchengine.party\000ne.party"..., m_ulen = 30, 
          m_scheme = 0x7fffffffc540 "http://www.searchengine.party/", m_slen = 4, m_host = 0x7fffffffc547 "www.searchengine.party/", m_hlen = 22, m_ip = 0, 
          m_path = 0x7fffffffc55d "/", m_plen = 1, m_query = 0x0, m_qlen = 0, m_extension = 0x0, m_elen = 0, m_filename = 0x0, m_flen = 21845, 
          m_domain = 0x7fffffffc54b "searchengine.party/", m_dlen = 18, m_tld = 0x7fffffffc558 "party/", m_tldLen = 5, m_mdlen = 12, m_port = 80, m_defPort = 80, 
          m_portLen = 0, m_portStr = 0x0, m_anchor = 0x0, m_anchorLen = 0}
        m = <optimized out>
        mlen = <optimized out>
        xx = <optimized out>
        xx = <optimized out>
#3  0x00005555558048f7 in handleRequest25 (slot=0x7fff71d10f34, netnice=<optimized out>) at Linkdb.cpp:800
        req = 0x55555b0f54d4
        slotNum = <optimized out>
        isSiteLinkInfo = <optimized out>
        m25 = 0x55555b0f66d0
        xx = <optimized out>
        xx = <optimized out>
#4  0x00005555556bb320 in UdpServer::makeCallback_ass (this=this@entry=0x55555600a5a0 <g_udpServer>, slot=slot@entry=0x7fff71d10f34) at UdpServer.cpp:2703
        saved2 = false
        xx = <optimized out>
        msgType = 37 '%'
        start = 1620219098502
        took = <optimized out>
        now = 1620219098502
        delta = <optimized out>
        n = <optimized out>
        bucket = <optimized out>
        mem = <optimized out>
        saved = 0
        saved2 = <optimized out>
        svt = {sival_int = <optimized out>, sival_ptr = <optimized out>}
--Type <RET> for more, q to quit, c to continue without paging--
        xx = <optimized out>
        xx = <optimized out>
        xx = <optimized out>
        xx = <optimized out>
        xx = <optimized out>
        xx = <optimized out>
#5  0x00005555556bbaaa in UdpServer::makeCallbacks_ass (this=this@entry=0x55555600a5a0 <g_udpServer>, niceness=niceness@entry=1) at UdpServer.cpp:2167
        start2 = 0
        logIt = <optimized out>
        h = <optimized out>
        took = <optimized out>
        elapsed = <optimized out>
        slot = 0x7fff71d10f34
        numCalled = 0
        doNicenessConversion = false
        startTime = 1620219098502
        pass = <optimized out>
        nextSlot = 0x0
#6  0x00005555556bc05a in UdpServer::makeCallbacks_ass (niceness=1, this=0x55555600a5a0 <g_udpServer>) at UdpServer.cpp:1952
        nextPass = <optimized out>
        fullRestart = <optimized out>
        numCalled = <optimized out>
        doNicenessConversion = <optimized out>
        startTime = <optimized out>
        pass = <optimized out>
        nextSlot = <optimized out>
        slot = <optimized out>
        h = <optimized out>
        start2 = <optimized out>
        logIt = <optimized out>
        took = <optimized out>
        elapsed = <optimized out>
        rdbId = <optimized out>
#7  UdpServer::process_ass (this=0x55555600a5a0 <g_udpServer>, now=1620219098501, maxNiceness=100) at UdpServer.cpp:1168
        loop = <optimized out>
        startTimer = 1620219098501
        flipped = <optimized out>
        needCallback = <optimized out>
        something = false
        slot = 0x0
        status = <optimized out>
        elapsed = <optimized out>
#8  0x0000555555736d52 in Loop::callCallbacks_ass (this=this@entry=0x55555609b160 <g_loop>, forReading=forReading@entry=true, fd=fd@entry=3, now=1620219098501,
    niceness=niceness@entry=0) at Loop.cpp:536
        saved = <optimized out>
        saved_errno = 0
        s = 0x7fffe8e1e194
        numCalled = <optimized out>
#9  0x0000555555737a8d in Loop::doPoll (this=this@entry=0x55555609b160 <g_loop>) at Loop.cpp:2133
        fd = 3
        i = 0
        again = <optimized out>
        n = <optimized out>
        v = {tv_sec = 0, tv_usec = 9997}
        readfds = {fds_bits = {8, 0 <repeats 15 times>}}
        writefds = {fds_bits = {0 <repeats 16 times>}}
        s = <optimized out>
        calledOne = true
        elapsed = <optimized out>
#10 0x000055555573803f in Loop::runLoop (this=0x55555609b160 <g_loop>) at Loop.cpp:1425
        sigs0 = {__val = {268505088, 0 <repeats 15 times>}}
#11 0x00005555555e9982 in main2 (argc=1, argv=0x7fffffffdcb8) at main.cpp:4091
--Type <RET> for more, q to quit, c to continue without paging--
        stackPointTestAnchor = 86 'V'
        cmdarg = 0
        cmd = 0x5555558d4c3e ""
        cmd2 = 0x5555558d4c3e ""
        arch = 64
        cc = 0x0
        testMandrill = false
        rl = {rlim_cur = 8388608, rlim_max = 18446744073709551615}
        lim = {rlim_cur = 1024, rlim_max = 1024}
        NOFILE = 1024
        rlim = {rlim_cur = 1024, rlim_max = 1024}
        isProxy = false
        useTmpCluster = 0 '\000'
        workingDir = 0x555555f7b9e0 <getcwd2(char*)::s_cwdBuf> "/mnt/NVMe/src/gigablast/gigablast-0.0+git20210505.9bf4fd2__/"
        h9 = 0x555556e60354
        ips = 0x55555675e088 <s_localIps+8>
        tmp = "/mnt/NVMe/src/gigablast/gigablast-0.0+git20210505.9bf4fd2__//cleanexit\000\000\354,\330VUU\000\000\344Y\330VUU\000\000\214\\\330VUU\000\000\064_\330VUU\000\000\334a\330VUU\000\000\204d\330VUU\000\000,g\330VUU\000"
        cleanFileName = {m_capacity = 128, m_length = 70, m_buf = 0x7fffffffd780 "/mnt/NVMe/src/gigablast/gigablast-0.0+git20210505.9bf4fd2__//cleanexit", m_label = 0x0,
          m_usingStack = true, m_encoding = 106, m_renderHtml = 85 'U'}
        pcount = 1
        structureFile = "/mnt/NVMe/src/gigablast/gigablast-0.0+git20210505.9bf4fd2__/catdb/gbdmoz.structure.dat\000\000`C\327VUU\000\000\240C\327VUU\000\000\320C\327VUU\000\000\070|\327VUU\000\000X|\327VUU\000\000\254\210\327VUU\000\000\214\211\327VUU\000\000Ȋ\327VUU\000\000\374\213\327VUU\000\000H\215\327VUU\000\000\350\215\327VUU\000\000\f\216\327VUU\000\000,\216\327VUU\000\000L\216\327VUU\000\000l"...
        nce = 544729
        maxMem = 100000
        json = {m_sb = {m_capacity = 0, m_length = 0, m_buf = 0x0, m_label = 0x555555a23caa "SafeBuf", m_usingStack = false, m_encoding = 106, m_renderHtml = -1 '\377'},
          m_stack = {0x5555597cf104, 0x5555597d37a9, 0x5555597d37f1, 0x5555597d34e2, 0x5555597d32f9, 0x0, 0x0, 0x7ffff7fdbf9c <check_match+300>, 0xb01bca00, 0x0,
            0x7ffff7fd0190, 0x7fffffffd6d0, 0x0, 0x7ffff7fdc3a4 <do_lookup_x+932>, 0x2, 0x7ffff7fd02d8, 0x7ffff7ffe730, 0x7fffffffd568, 0x7fffffffd564,
            0x7ffff7fdbf9c <check_match+300>, 0xe6854b87, 0x0, 0x7ffff7fd02d8, 0x7ffff7fd01b8, 0x7ffff783c478, 0xb01bca00, 0x2c06f28, 0x7fffffffd564, 0x7ffff7ffe758,
            0x7fffffffd630, 0x7fffffffd6d0, 0x7fffffffd620, 0x1, 0x7fff00000000, 0xf76c6308, 0x555556cc6288 <g_repair+218984>, 0x7fffffffd590,
            0x555556da1c24 <g_scraper+197284>, 0x7fffffffd5a0, 0x55555572e05c <Msg0::constructor()+164>, 0x7fffffffd5c0, 0x555556da1b98 <g_scraper+197144>, 0x7fffffffd5d0,
            0x55555572df79 <Msg0::Msg0()+61>, 0x555556d719d4 <g_scraper+84>, 0x55555570deb7 <Query::reset()+275>, 0x7fffffffd5e0, 0x555556dbba0c <g_scraper+303244>,
            0xffffffffffffffff, 0x555556e13274 <g_scraper+661748>, 0x555556d719d4 <g_scraper+84>, 0x555555835192 <TagRec::reset()+74>, 0x555556e0fa34 <g_scraper+647348>,
            0x555556d8b860 <g_scraper+106208>, 0x555556d719d4 <g_scraper+84>, 0x555556da20b4 <g_scraper+198452>, 0x555556d719d4 <g_scraper+84>, 0x555555661829
     <XmlDoc::reset()+4137>, 0x7fffffffd650, 0x555556d719d4 <g_scraper+84>, 0x555556e1c354 <g_scraper+698836>, 0x555556e1c2dc <g_scraper+698716>,
            0x555556e1c2fc <g_scraper+698748>, 0x555556e1c374 <g_scraper+698868>}, m_stackPtr = 0, m_prev = 0x5555597d39e6}
#12 0x00005555555e2e35 in main (argc=1, argv=0x7fffffffdcb8) at main.cpp:399
        ret = 0

(gdb) thread apply all bt

Thread 1 (Thread 0x7ffff76a6740 (LWP 230221) "gb"):
#0  Msg25::sendRequests (this=0x55555b0f66d0) at Linkdb.cpp:1515
#1  0x0000555555801b16 in Msg25::doReadLoop (this=this@entry=0x55555b0f66d0) at Linkdb.cpp:1212
#2  0x00005555558046e1 in Msg25::getLinkInfo2 (this=this@entry=0x55555b0f66d0, site=0x55555b0f5593 "www.searchengine.party", url=0x55555b0f557c "www.searchengine.party", isSiteLinkInfo=isSiteLinkInfo@entry=true, ip=1340591670, docId=<optimized out>, collnum=0, qbuf=0x0, qbufSize=0, state=0x55555b0f66d0, callback=0x5555557fe7e0 <sendReplyWrapper(void*)>, isInjecting=false, printDebugMsgs=false, printInXml=false, siteNumInlinks=0, oldLinkInfo=0x0, niceness=1, doLinkSpamCheck=true, oneVotePerIpDom=<optimized out>, canBeCancelled=true, lastUpdateTime=1620219098, onlyNeedGoodInlinks=<optimized out>, getLinkerTitles=false, ourHostHash32=0, ourDomHash32=0, linkInfoBuf=0x55555b0f66e0) at Linkdb.cpp:1079
#3  0x00005555558048f7 in handleRequest25 (slot=0x7fff71d10f34, netnice=<optimized out>) at Linkdb.cpp:800
#4  0x00005555556bb320 in UdpServer::makeCallback_ass (this=this@entry=0x55555600a5a0 <g_udpServer>, slot=slot@entry=0x7fff71d10f34) at UdpServer.cpp:2703
#5  0x00005555556bbaaa in UdpServer::makeCallbacks_ass (this=this@entry=0x55555600a5a0 <g_udpServer>, niceness=niceness@entry=1) at UdpServer.cpp:2167
#6  0x00005555556bc05a in UdpServer::makeCallbacks_ass (niceness=1, this=0x55555600a5a0 <g_udpServer>) at UdpServer.cpp:1952
#7  UdpServer::process_ass (this=0x55555600a5a0 <g_udpServer>, now=1620219098501, maxNiceness=100) at UdpServer.cpp:1168
#8  0x0000555555736d52 in Loop::callCallbacks_ass (this=this@entry=0x55555609b160 <g_loop>, forReading=forReading@entry=true, fd=fd@entry=3, now=1620219098501, niceness=niceness@entry=0) at Loop.cpp:536
#9  0x0000555555737a8d in Loop::doPoll (this=this@entry=0x55555609b160 <g_loop>) at Loop.cpp:2133
#10 0x000055555573803f in Loop::runLoop (this=0x55555609b160 <g_loop>) at Loop.cpp:1425
#11 0x00005555555e9982 in main2 (argc=1, argv=0x7fffffffdcb8) at main.cpp:4091
#12 0x00005555555e2e35 in main (argc=1, argv=0x7fffffffdcb8) at main.cpp:399
onlyjob commented 3 years ago

Looks like this problem is due to hard-coded -O3 and -O2 in Makefile. Segfault is not happening when those options are removed.

onlyjob commented 3 years ago

Appears to be fixed by https://github.com/gigablast/open-source-search-engine/pull/180/commits/64cf6411b28f204047bac5307b7d31372a8c0bfd

CompunixAustralia commented 3 years ago

Core dump running in a linux container with Debian 10 using -O3, -O2 or -O1. No optimisations (-O0) corrects issue and can index sites

CompunixAustralia commented 3 years ago

When I change the following to -O0, indexing works.

Onlyjob: -O3 is very unstable and causes segfaults all over (e.g. #172).

CC_OPT_ARG ?= -O2

tcreek commented 1 year ago

So changing all instances of CC_OPT_ARG ?= -O2 to CC_OPT_ARG ?= -O0 should stop the core dump/segfault in Debian 10?

Still happening to me.