drgrib / alfred-bear

Streamlined note searching and creation for Bear using Alfred
MIT License
427 stars 24 forks source link

Proper lowercase for SQLite3 #30

Closed kudrykv closed 4 years ago

kudrykv commented 4 years ago

SQLite3 lowercase and uppercase only ASCII. It ignores other charsets.

Bear knows how to search that:

Screenshot 2020-08-27 at 17 05 39

bs this note in recent, but it can't find it via word search:

Screenshot 2020-08-27 at 17 05 47 Screenshot 2020-08-27 at 17 05 55

The PR addresses the issue leveraging the Golang lowercase, making it work:

Screenshot 2020-08-27 at 17 07 27
drgrib commented 4 years ago

Great work. Thank you.

drgrib commented 4 years ago

@kudrykv I pulled this locally and built it and now I am getting normal results only for the query e. All other queries, for example, g match nothing anymore. Any idea what is going on here?

drgrib commented 4 years ago

I've reverted the binary file in the release back to the previous version. It would be helpful to me if you paste the unicode text of your upper and lower case non-latin character search and note title so I can test your test cases. If you are able to debug this with my latin text cases using e and g, that would be helpful as well.

kudrykv commented 4 years ago

I am getting normal results only for the query e. All other queries, for example, g match nothing anymore

@drgrib That is weird 😳 . I'll look in other binaries, whether I broke anything. Did you run search or other binaries? Can you please give me more details? I have built and replaced my search binary, it worked

Screenshot 2020-08-28 at 08 20 38 Screenshot 2020-08-28 at 08 22 32 Screenshot 2020-08-28 at 08 22 44

unicode text of your upper and lower case non-latin character search and note title so I can test your test cases

e.g. СВІДОЦТВО in the Bear note, and then свідоцтво in Alfred. When I was doing that, I noticed that the library lowercases the search term (so it becomes свідоцтво), but in the note is is mixed case (СВІДОЦТВО). Then the query didn't work -- forced lowercased with Go didn't match not-lowercased non-ascii in sqlite.

drgrib commented 4 years ago

@kudrykv Yeah, I am really at a loss why this isn't working for me because the code looks correct. But I'm the one who produces the binary file used for the workflow, so if my build is broken, it breaks it for everyone.

I ran this program as a sanity test and all the calls worked as expected, reproducing the bad result for lower and good results for utflower:

package main

import (
    "database/sql"
    "fmt"
    "strings"

    "github.com/mattn/go-sqlite3"
)

func testStringQuery(db *sql.DB, q string) {
    row := db.QueryRow(q)
    var s string
    err := row.Scan(&s)
    if err != nil {
        panic(err)
    }
    fmt.Println(s)
    fmt.Println()
}

func utfLower(s string) string {
    return strings.ToLower(s)
}

func main() {
    var q string
    path := "test.db"

    sql.Register("sqlite3_custom", &sqlite3.SQLiteDriver{
        ConnectHook: func(conn *sqlite3.SQLiteConn) error {
            return conn.RegisterFunc("utflower", utfLower, true)
        },
    })

    db, err := sql.Open("sqlite3_custom", path)
    if err != nil {
        panic(err)
    }

    q = `SELECT lower('Hello world')`
    testStringQuery(db, q)
    q = `SELECT lower('Б')`
    testStringQuery(db, q)

    q = `SELECT utflower('Hello world')`
    testStringQuery(db, q)
    q = `SELECT utflower('Б')`
    testStringQuery(db, q)

    q = `SELECT utflower('g')`
    testStringQuery(db, q)
    q = `SELECT utflower('G')`
    testStringQuery(db, q)

    fmt.Println("--")
    fmt.Println()
    fmt.Println(utfLower("Б"))
    fmt.Println(utfLower("g"))
}

Let me debug what is going wrong in my local Bear repo and I'll get back to you.

drgrib commented 4 years ago

So here is what I've found. I can get the query to work if I change it to this:

    NOTES_BY_QUERY = `
SELECT DISTINCT
    note.ZUNIQUEIDENTIFIER, note.ZTITLE, group_concat(tag.ZTITLE)
FROM
    ZSFNOTE note
    LEFT OUTER JOIN Z_7TAGS nTag ON note.Z_PK = nTag.Z_7NOTES
    LEFT OUTER JOIN ZSFNOTETAG tag ON nTag.Z_14TAGS = tag.Z_PK
WHERE
    note.ZARCHIVED=0
    AND note.ZTRASHED=0
    AND (
        utflower(note.ZTITLE) LIKE utflower('%%%s%%') OR
        lower(note.ZTEXT) LIKE lower('%%%s%%')
    )
GROUP BY note.ZUNIQUEIDENTIFIER
ORDER BY case when utflower(note.ZTITLE) LIKE utflower('%%%s%%') then 0 else 1 end, note.ZMODIFICATIONDATE DESC
LIMIT 400`

Notice that I use the built-in lower for the ZTEXT condition. Here is what I suspect: there is a limit on how large a string can be that is sent to the RegisterFunc function. I have some massive ZTEXT in some of my notes.

What confuses me about this is that there is no error when I use utflower for ZTEXT. It just returns no results. Maybe sqlite3 has poor handling for function failures and returns zero results instead of an error? I'll take a glance at their RegisterFunc code to see if I find evidence for this. But I'm open to insight you have as well.

kudrykv commented 4 years ago

@drgrib wow, that's a great finding! I'll also take a look what are the options there.

kudrykv commented 4 years ago

I've tried a synthetic case with 100 million characters, SQLite did fine:

package bearkeek

import (
    "database/sql"
    "math/rand"
    "strings"
    "testing"
    "time"

    "github.com/mattn/go-sqlite3"
)

func init() {
    rand.Seed(time.Now().UnixNano())

    sql.Register("sqlite_custom", &sqlite3.SQLiteDriver{
        ConnectHook: func(conn *sqlite3.SQLiteConn) error {
            return conn.RegisterFunc("utflower", strings.ToLower, true)
        },
    })
}

func TestTest(t *testing.T) {
    open, err := sql.Open("sqlite_custom", "./test.sqlite3")
    if err != nil {
        t.Fatal(err)
    }

    if _, err = open.Exec("create table if not exists tbl(texty text);"); err != nil {
        t.Fatal(err)
    }

    if _, err = open.Exec("delete from tbl"); err != nil {
        t.Fatal(err)
    }

    searchTerm := "BANANA БАНАН"
    theBigNote := RandStringRunes(50000000) + searchTerm + RandStringRunes(50000000)
    if _, err = open.Exec("insert into tbl (texty) values (?)", theBigNote); err != nil {
        t.Fatal(err)
    }

    query, err := open.Query("select texty from tbl where utflower(texty) like utflower(?)", "%"+searchTerm+"%")
    if err != nil {
        t.Fatal(err)
    }

    if query.Err() != nil {
        t.Fatal(query.Err())
    }

    if !query.Next() {
        t.Fatal("no rows")
    }

    var got string
    if err = query.Scan(&got); err != nil {
        t.Fatal(err)
    }

    t.Log(len([]rune(got)))
}

var letterRunes = []rune(
    "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" +
        "абвгґдеєжзиіїйклмнопрстуфхцчшщьюяАБВГҐДЕЄЖЗИІЇЙКЛМНОПРСТУФХЦЧШЩЬЮЯ",
)

func RandStringRunes(n int) string {
    b := make([]rune, n)
    for i := range b {
        b[i] = letterRunes[rand.Intn(len(letterRunes))]
    }
    return string(b)
}

It returned me the text (logging just length b/c that's too much to print).

I need to take a closer look on Bear table schema then.

drgrib commented 4 years ago

Yeah, I tried large sizes as well and size did not seem to be the problem. So far I have discovered that the query works if I:

OR

I'm really confused what is going on here. This is the minimal change I can get to function correctly, which is not desirable but better than nothing: https://github.com/drgrib/alfred-bear/pull/32.