eXist-db / exist

eXist Native XML Database and Application Platform
https://exist-db.org
GNU Lesser General Public License v2.1
428 stars 179 forks source link

Poor performance of varied query #3418

Open adamretter opened 4 years ago

adamretter commented 4 years ago

The following query executes in:

  1. ~5 seconds in BaseX (9.2.4)
  2. ~5.5 seconds in Saxon (EE 9.8.0.12)
  3. ~18 seconds in eXist-db (5.3.0-SNAPSHOT)
let $json := fn:unparsed-text("https://api.github.com/repos/exist-db/exist/stats/contributors")
let $data := fn:json-to-xml($json)
return
<results>{
  for $year in (2005 to 2020)
  let $ut-start := (xs:dateTime($year || "-01-01T00:00:00-00:00") - xs:dateTime("1970-01-01T00:00:00-00:00")) div xs:dayTimeDuration('PT1S')
  let $ut-end := (xs:dateTime(($year + 1)|| "-01-01T00:00:00-00:00") - xs:dateTime("1970-01-01T00:00:00-00:00")) div xs:dayTimeDuration('PT1S')
  order by $year descending
  return
    <year number="{$year}">
      {
        for $user-data in $data/fn:array/fn:map
        let $username := $user-data//fn:map[@key eq "author"]/fn:string[@key eq "login"]/string(.)
        let $commits := fn:sum($user-data/fn:array[@key eq "weeks"]/fn:map[fn:number[@key eq "w"][xs:int(.) ge $ut-start][xs:int(.) lt $ut-end]]/fn:number[@key eq "c"]/xs:int(.))
        where $commits gt 0
        order by $commits descending
        return
        <user name="{$username}">{$commits}</user>

      }
    </year>
}</results>
line-o commented 4 years ago

Would you consider the runtime of this query a bug, @adamretter ? This ticket is possibly a duplicate of #3406 Impossible to tell, without timing the execution time of the separate tasks (downloading an external resource, converting json to xml, nested FLWORs ).

adamretter commented 4 years ago

Would you consider the runtime of this query a bug, @adamretter ?

No. I am not reporting a bug. I am reporting a performance issue. I believe the profile and problem is different from the many problems illustrated in #3406... otherwise I would not of posted it ;-)

The problem is not the downloading of an external resource. Previously I was reading it directly from the local filesystem, I rather added that to make it portable between implementations and people. I should have perhaps mentioned that!

Sure, the problem needs to be broken down and investigated, but the first step is always reporting that there is a problem. I didn't want the issue to be lost.