echonest / echoprint-server

Server components for Echoprint
http://echoprint.me/server
Other
424 stars 153 forks source link

Queries take tremendous amount of time (4-15 seconds) average 6 seconds #19

Open danielbenzvi opened 12 years ago

danielbenzvi commented 12 years ago

Hello,

We have implemented echoprint and ingested 600,000 songs (full length) into the database. The Tokyo database size is 65GB and the SOLR database size is 20GB. Unfortunately queries take a tremendous amount of time. We tried to optimize the solr database but the query time didn't improve.

Example query:

INFO: [fp] webapp=/solr path=/select params={echoParams=none&fl=track_id,score&q=909794+913+1003402+913+303386+913+584877+913+232554+913+956476+913+431834+950+679300+950+240931+950+955331+950+995773+950+357692+950+593061+976+70763+976+782224+976+782173+976+622726+976+601533+976+1011732+1027+796909+1027+763780+1027+566013+1027+753588+1027+312100+1027+193929+1051+316240+1051+332598+1051+19627+1051+692188+1051+430606+1051+281612+1164+845652+1164+397749+1164+200843+1164+468276+1164+764632+1164+527099+1181+731221+1181+394872+1181+947331+1181+160401+1181+800128+1181+1018565+1203+470445+1203+242926+1203+616082+1203+339907+1203+566977+1203+5230+1258+280182+1258+156696+1258+48596+1258+673917+1258+210986+1258+957346+1296+859097+1296+32268+1296+153048+1296+447624+1296+425653+1296+328097+1347+297986+1347+888504+1347+658455+1347+998338+1347+755650+1347+127671+1425+539061+1425+152124+1425+1932+1425+490734+1425+259543+1425+777147+1489+143020+1489+532966+1489+332011+1489+610997+1489+294746+1489+531412+1566+705565+1566+891476+1566+994477+1566+398781+1566+896153+1566+868163+1630+773449+1630+836516+1630+142472+1630+1010289+1630+163754+1630+684193+1682+803149+1682+657239+1682+732748+1682+732748+1682+732748+1682+94257+1770+732748+1770+732748+1770+732748+1770+732748+1770+732748+1770+327016+898+323107+898+1036138+898+636940+898+395567+898+275946+898+317417+926+376363+926+890914+926+705682+926+636726+926+202564+926+845159+950+728869+950+828839+950+94394+950+423738+950+701545+950+887760+1038+308181+1038+263679+1038+914857+1038+354865+1038+624759+1038+316618+1116+751629+1116+130128+1116+896612+1116+848012+1116+926584+1116+553706+1181+919921+1181+782385+1181+1037873+1181+800983+1181+920557+1181+528646+1258+712675+1258+37742+1258+204631+1258+708609+1258+803397+1258+601702+1348+821948+1348+279844+1348+46205+1348+933870+1348+770909+1348+959027+1412+857557+1412+948365+1412+325558+1412+411016+1412+269648+1412+442965+1578+949268+1578+1018985+1578+471072+1578+303080+1578+23059+1578+735935+1682+104536+1682+889609+1682+288667+1682+765904+1682+1047370+1682+387943+1770+36761+1770+269789+1770+823578+1770+528495+1770+274497+1770+398917+1849+720708+1849+444872+1849+24326+1849+347110+1849+554892+1849+883660+1989+300268+1989+112239+1989+1047381+1989+171935+1989+525106+1989+150300+2014+775632+2014+1039872+2014+66747+2014+66747+2014+66747+2014+1044341+2079+66747+2079+66747+2079+66747+2079+66747+2079+66747+2079+98556+898+746599+898+819520+898+625029+898+739693+898+514987+898+292281+950+923614+950+1016085+950+799882+950+8204+950+48660+950+317089+1027+933171+1027+496108+1027+83259+1027+426712+1027+8204+1027+727312+1103+76705+1103+565244+1103+229512+1103+276642+1103+621608+1103+402455+1131+993209+1131+142593+1131+664342+1131+591259+1131+826795+1131+294290+1181+474509+1181+2220+1181+276642+1181+115336+1181+312690+1181+841782+1258+617667+1258+163360+1258+790518+1258+833952+1258+217401+1258+420326+1411+581324+1411+933171+1411+973988+1411+678469+1411+142734+1411+689071+1437+901208+1437+163616+1437+401241+1437+529431+1437+919267+1437+801471+1488+235059+1488+614337+1488+670384+1488+446479+1488+685010+1488+493673+1566+947773+1566+533214+1566+252420+1566+742507+1566+625488+1566+581324+1604+457499+1604+617667+1604+177010+1604+628783+1604+115336+1604+819520+1630+260405+1630+384528+1630+986218+1630+811808+1630+183164+1630+292281+1758+83259+1758+8204+1758+747297+1758+790518+1758+158959+1758+933171+1835+83259+1835+590701+1835+747297+1835+237128+1835+504687+1835+623755+1912+1015745+1912+187178+1912+907455+1912+907455+1912+907455+1912+933171+1989+907455+1989+907455+1989+907455+1989+907455+1989+907455+1989+644707+898+355989+898+262809+898+901066+898+197301+898+846082+898+97489+950+809516+950+547829+950+206587+950+923296+950+361389+950+84536+1026+192110+1026+241444+1026+356019+1026+547829+1026+979165+1026+465926+1103+46194+1103+367139+1103+273667+1103+537071+1103+874329+1103+46673+1180+175512+1180+46194+1180+1036550+1180+315117+1180+803334+1180+839228+1202+349948+1202+88536+1202+262368+1202+668559+1202+36704+1202+648336+1258+308864+1258+10297+1258+536353+1258+918294+1258+54691+1258+874204+1335+272900+1335+97489+1335+171709+1335+356019+1335+547829+1335+646927+1378+351318+1378+1016527+1378+688694+1378+679940+1378+930878+1378+84536+1412+868276+1412+433288+1412+948093+1412+180168+1412+16059+1412+299279+1488+2843+1488+904122+1488+91558+1488+213518+1488+991955+1488+334613+1566+827953+1566+730944+1566+5278+1566+36280+1566+11265+1566+84536+1682+597540+1682+115352+1682+940566+1682+929096+1682+678580+1682+392847+1758+335563+1758+423247+1758+209740+1758+373538+1758+727531+1758+325153+1835+254323+1835+704166+1835+1026159+1835+1026159+1835+1026159+1835+931697+1875+1026159+1875+1026159+1875+1026159+1875+1026159+1875+1026159+1875+413450+899+827430+899+377816+899+342246+899+1001758+899+602740+899+546034+950+190166+950+1026816+950+997142+950+525328+950+684191+950+499217+1026+910321+1026+42130+1026+451494+1026+749446+1026+946414+1026+326065+1056+55811+1056+56244+1056+950756+1056+911180+1056+587607+1056+231252+1163+809979+1163+936140+1163+544586+1163+912950+1163+944283+1163+912546+1180+559888+1180+751143+1180+1026461+1180+52526+1180+676102+1180+599902+1258+264487+1258+129208+1258+338714+1258+206006+1258+351953+1258+536701+1283+98059+1283+31802+1283+925981+1283+418261+1283+224128+1283+25289+1349+870120+1349+495037+1349+172361+1349+492381+1349+425795+1349+744587+1413+61147+1413+777664+1413+1025932+1413+780273+1413+477990+1413+648544+1482+226590+1482+524066+1482+926283+1482+1017760+1482+932879+1482+373392+1566+45268+1566+650641+1566+30935+1566+577205+1566+274517+1566+345040+1605+200688+1605+440760+1605+49828+1605+189296+1605+605412+1605+633776+1631+658728+1631+988114+1631+609538+1631+191745+1631+969433+1631+958651+1758+379088+1758+656887+1758+704445+1758+381911+1758+736590+1758+134776+1791+643598+1791+284679+1791+1040459+1791+240343+1791+397235+1791+50740+1849+86300+1849+778393+1849+393888+1849+521884+1849+589014+1849+633526+1989+479631+1989+428713+1989+707586+1989+707586+1989+707586+1989+950298+2066+707586+2066+707586+2066+707586+2066+707586+2066+707586+2066+567370+898+856052+898+417830+898+607505+898+575141+898+1023879+898+198147+949+906970+949+852357+949+410162+949+519528+949+360064+949+840947+1026+86416+1026+852357+1026+410162+1026+519528+1026+821038+1026+937532+1103+580606+1103+419146+1103+417963+1103+1004729+1103+1046870+1103+937532+1180+994180+1180+788786+1180+417963+1180+1004729+1180+838045+1180+855512+1258+994180+1258+319803+1258+417963+1258+519528+1258+360064+1258+198147+1335+86416+1335+852357+1335+397752+1335+22433+1335+931971+1335+840947+1411+413342+1411+635013+1411+1021675+1411+11367+1411+862073+1411+402613+1488+233303+1488+21159+1488+520725+1488+145256+1488+913842+1488+887192+1566+998048+1566+328388+1566+589346+1566+526930+1566+182214+1566+74333+1629+423390+1629+417830+1629+475237+1629+502441+1629+56572+1629+198147+1682+410162+1682+519528+1682+555287+1682+45885+1682+519824+1682+86416+1758+410162+1758+1046870+1758+555287+1758+956708+1758+212352+1758+788786+1835+1004729+1835+821038+1835+755198+1835+755198+1835+755198+1835+840947+1989+755198+1989+755198+1989+755198+1989+755198+1989+755198+1989+522420+912+782348+912+743398+912+659944+912+532858+912+645434+912+418126+950+244147+950+480019+950+793161+950+485121+950+1019099+950+58179+1026+147904+1026+261231+1026+1004888+1026+212699+1026+55080+1026+55623+1050+46122+1050+157427+1050+736076+1050+531490+1050+272849+1050+217998+1103+352202+1103+817059+1103+466481+1103+648026+1103+599913+1103+285160+1142+769579+1142+98482+1142+274099+1142+592962+1142+915113+1142+817059+1180+310085+1180+993131+1180+316164+1180+653951+1180+48965+1180+899550+1258+627427+1258+447399+1258+834624+1258+484882+1258+727121+1258+189081+1335+94272+1335+481689+1335+345585+1335+1045346+1335+726652+1335+88655+1437+989447+1437+614883+1437+30805+1437+234430+1437+643247+1437+249221+1489+605851+1489+19438+1489+160094+1489+352315+1489+829851+1489+798933+1566+866675+1566+158715+1566+529925+1566+801105+1566+803628+1566+537603+1682+87298+1682+360185+1682+107280+1682+911687+1682+29681+1682+602499+1771+76392+1771+779655+1771+85794+1771+961288+1771+643938+1771+810763+1835+566764+1835+668780+1835+412470+1835+412470+1835+412470+1835+501111+1989+412470+1989+412470+1989+412470+1989+412470+1989+412470+1989+225288+898+110450+898+34699+898+377652+898+297841+898+727573+898+882726+950+607266+950+801232+950+84196+950+1004925+950+874779+950+233656+1026+607266+1026+292406+1026+84196+1026+126734+1026+401335+1026+581373+1103+946649+1103+292406+1103+932103+1103+386298+1103+685425+1103+581373+1181+406763+1181+986394+1181+713214+1181+172253+1181+211470+1181+866441+1258+778577+1258+781727+1258+713214+1258+721558+1258+874779+1258+1027650+1335+496259+1335+801232+1335+469418+1335+964605+1335+620965+1335+820042+1358+136096+1358+891277+1358+640040+1358+564206+1358+809061+1358+117152+1488+362914+1488+43642+1488+334647+1488+828162+1488+580511+1488+422394+1566+845017+1566+265278+1566+257711+1566+609593+1566+738605+1566+687628+1622+365286+1622+1017500+1622+767586+1622+178714+1622+377700+1622+310182+1682+233656+1682+634147+1682+607266+1682+781727+1682+960386+1682+921346+1758+1027650+1758+677987+1758+496259+1758+946649+1758+292406+1758+518035+1782+836578+1782+820042+1782+436089+1782+772516+1782+577434+1782+233656+1835+607266+1835+157632+1835+84196+1835+172253+1835+401335+1835+677987+1912+778577+1912+292406+1912+1038773+1912+718850+1912+643934+1912+233656+1989+696335+1989+8780+1989+809256+1989+809256+1989+809256+1989+409865+2066+809256+2066+809256+2066+809256+2066+809256+2066+809256+2066&qt=/hashq&wt=standard&rows=30&version=2.2} hits=5923078 status=0 QTime=6849

Are we doing anything wrong here?

Sincerely, Daniel.

alnesbit commented 12 years ago

Hello Daniel,

Apologies for the delay in getting back to you regarding this. Are you still having this problem?

Did you split the fingerprint codes into overlapping segments upon ingestion or are you ingesting full codes? In other words, how did you run the ingestion, and was split=True or split=False set when calling fp.py:ingest?

What happens when you make rapid, repeated queries of the Solr index? Do all the queries take a long time or

Andrew

danielbenzvi commented 12 years ago

Hello Andrew, We are still receiving this problem and is consistent through upgrades.

Multiple queries to the exactly same result set will become faster but not significantly faster. The lowest we can get is 1.9 seconds. The highest was 28 seconds and it was the only query being performed on the system.

We ingested full length codes and we used split=True (as defined in fp.py).

All the queries take long time.

Tomtomgo commented 11 years ago

I experience exactly the same issue... Did you find a solution @danielbenzvi ?

alnesbit commented 11 years ago

Lately we have been investigating this issue in detail and have found that when the Solr database becomes very large then performance upon querying can indeed suffer in this way, if the entire index is deployed onto a single Solr core on one server.

We've found various solutions that have helped tremendously in reducing the time required to perform a query (e.g., improvements in time of about an order of magnitude). One of these solutions involves sharding, which requires a more complicated Solr setup. Another solution involves changes to the way the fingerprints are actually indexed and queried. We have had great success in running these improvements on our servers that are behind the song/identify method on our API.

We will push out source code when it is ready for GitHub, for example, to make a more sophisticated Solr configuration easier to deploy out-of-the-box (no ETA yet). But this will most likely involve large changes to the back end rather than tweaking the current setup.

ranger123 commented 11 years ago

Andrew, Could you provide a little more detail as to how you've adjusted the indexing and queries to improve the Solr query times? I'm struggling getting an acceptable response time for a large collection and am interesting in any direction you may be able to provide to assist. thanks.

zemariamm commented 11 years ago

Same problem here guys, Solr is taking too long to answer.. I get response times around 5 seconds per query, I used the patches suggest by Justin Haygood (https://groups.google.com/forum/#!topic/echoprint/J7MQftCfpCM) which improved the recognition significantly. Any ideas ?

alnesbit commented 11 years ago

Increasing the density of hash codes will improve the OTA recognition rate, but this will also make the Solr part of the search significantly slower.

The overall ideas in improving scalability of the index are the following:

We've already tried the first approach. It improves the results but it is a hack, and the other approaches are better.

zemariamm commented 11 years ago

Thanks for the fast answer Andrew! I actually ran a few tests that surprised me (with Justin Haygood's patch):

So replacing Solr for the newest version should fasten it right ? I'll give it a try :)

Thanks for the help! Zé

ranger123 commented 11 years ago

Hi Andrew, Thanks for the response. I wasn't able to reach the C experimental repo either. I'd be interested in taking a look.

I did take a look at migrating to Solr 4.x, but it looks like there are a few functions that have been deprecated that prevent the hashr from compiling. I did try using a version from another user that utilizes Maven to compile other versions, but it would only compile to 3.x.

When you mention an uninformative hash, could you help me understand what type of hash value would be uninformative? Thanks.

danicuki commented 10 years ago

I am having the same issue here. Does anyone have a solution?