Closed lordwelch closed 6 months ago
I did have a look at FTS and decided against it and went with adding %
's instead for the search. Both problems come down to the '
being removed. Removing utils.sanitize_title(series_name, literal)
should fix both issues.
The other option is to replace '
and "
(maybe some more) with %
the same as any spaces.
https://github.com/mizaki/gcd_talker/commit/97daa5f98ec5a3fe0242a25702476844285b9227 should have fixed both issues.
I had another look at FTS and it wouldn't help with the '
issue and I don't see it helping unless wanting to say search over the descriptions too. Which I don't think is all that helpful?
The exact search that I opened this for now works. But "Cory Doctorows Futuristic Tales of the Here and Now" doesn't work which is just as problematic.
Also auto-identify doesn't appear to get any results when a year is specified but the screenshot in #5 shows that it has the correct year. Here is the log. ComicTagger.log
Also using FTS would make this better as it handles all of the different forms that I can come up with.
Create the table and populate
CREATE VIRTUAL TABLE ft USING fts5(name, content='gcd_series', content_rowid='id', tokenize = 'porter unicode61 remove_diacritics 1') ;
INSERT INTO ft(ft) VALUES('rebuild');
Search: Cory Doctorow's Futuristic Tales of the Here and Now
(fts needs special characters '
quoted sqlite escapes '
by doubling ''
)
SELECT gcd_series.id AS 'id', gcd_series.name AS 'series_name',
gcd_series.sort_name AS 'sort_name', gcd_series.notes AS 'notes',
gcd_series.year_began AS 'year_began', gcd_series.year_ended AS 'year_ended',
gcd_series.issue_count AS 'issue_count', gcd_publisher.name AS 'publisher_name'
FROM ft('Cory "Doctorow''s" Futuristic Tales of the Here and Now')
LEFT JOIN gcd_series on ft.rowid=gcd_series.id
LEFT JOIN gcd_publisher ON gcd_series.publisher_id=gcd_publisher.id ;
Search: Cory "Doctorow s" Futuristic Tales of the Here and Now
("Doctorow s" is a single term)
SELECT gcd_series.id AS 'id', gcd_series.name AS 'series_name',
gcd_series.sort_name AS 'sort_name', gcd_series.notes AS 'notes',
gcd_series.year_began AS 'year_began', gcd_series.year_ended AS 'year_ended',
gcd_series.issue_count AS 'issue_count', gcd_publisher.name AS 'publisher_name'
FROM ft('Cory "Doctorow s" Futuristic Tales of the Here and Now')
LEFT JOIN gcd_series on ft.rowid=gcd_series.id
LEFT JOIN gcd_publisher ON gcd_series.publisher_id=gcd_publisher.id ;
Search: Cory Doctorow s Futuristic Tales of the Here and Now
("Doctorow s" is two terms)
SELECT gcd_series.id AS 'id', gcd_series.name AS 'series_name',
gcd_series.sort_name AS 'sort_name', gcd_series.notes AS 'notes',
gcd_series.year_began AS 'year_began', gcd_series.year_ended AS 'year_ended',
gcd_series.issue_count AS 'issue_count', gcd_publisher.name AS 'publisher_name'
FROM ft('Cory Doctorow s Futuristic Tales of the Here and Now')
LEFT JOIN gcd_series on ft.rowid=gcd_series.id
LEFT JOIN gcd_publisher ON gcd_series.publisher_id=gcd_publisher.id ;
Search: Cory Doctorows Futuristic Tales of the Here and Now
SELECT gcd_series.id AS 'id', gcd_series.name AS 'series_name',
gcd_series.sort_name AS 'sort_name', gcd_series.notes AS 'notes',
gcd_series.year_began AS 'year_began', gcd_series.year_ended AS 'year_ended',
gcd_series.issue_count AS 'issue_count', gcd_publisher.name AS 'publisher_name'
FROM ft('Cory Doctorows Futuristic Tales of the Here and Now')
LEFT JOIN gcd_series on ft.rowid=gcd_series.id
LEFT JOIN gcd_publisher ON gcd_series.publisher_id=gcd_publisher.id ;
Interestingly removing the '
makes it fail on GCD itself: https://www.comics.org/series/name/Cory%20Doctorows%20Futuristic%20Tales%20of%20the%20Here%20and%20Now/sort/alpha/
I've added FTS in https://github.com/mizaki/gcd_talker/commit/b6252756b384147e0cd9aa59b82b7f9a0f33e4ea
Not sure on checking the fts
and fts_docsize
tables to see if a rebuild
is needed. I can't really see where it would be required.
It shouldn't be needed unless someone is doing some custom modifications to the db, you can also setup hooks to keep it up to date in case someone does
Thanks for fixing this!
When searching for "Cory Doctorow's Futuristic Tales of the Here and Now" nothing is returned, searching for "Cory Doctorow" does return the correct result.
SQLite has a full text search option which is probably useful, I checked python 3.8-3.12 and pypy3 and they all include the FTS5 extension. You could also check if it is supported first https://stackoverflow.com/a/36663390
The literal search also doesn't work because it removes the
'
from the search.