Closed BasVerlooy closed 6 months ago
Nice idea! I remember this has bit me a number of times in the past, which you can see if you git grep -i Cast
. However, now I can't seem to be able to replicate the problem. See the example below from examples/cdindex
:
sqlite> .schema works
CREATE TABLE works(
published_year,
id,
published_day,
doi,
published_month
);
sqlite> select * from works where published_year >2010 limit 5;
2011|378||10.7202/1063035ar|
2011|493||10.7202/1063027ar|
2011|585||10.4000/books.apu.8998|
2011|666||10.7202/1063094ar|
2011|694|1|10.18146/tmg.584|6
Adding a data type to an SQLite column, has only a very small effect on its behavior. Could the type only affect the Perl interface? Can you please give me a simple before/after example so I can understand what is going on?
You are right! Pubmed requires this fix. Crossref doesn't, because its data are in JSON format, which distinguishes between numbers and strings. In the Pubmed XML everything is a string. Similar fixes are required for all numbers obtained from XML data. Well done and thank you for your patience!
$ bin/a3k query crossref tests/data/crossref-sample/ --query 'SELECT doi FROM works WHERE published_year BETWEEN 2018 AND 2020' 10.46692/9781529202618.006
10.1145/3196398.3196476
$ bin/a3k query pubmed tests/data/pubmed-sample/ --query 'SELECT doi FROM pubmed_articles WHERE pubmed_articles.revised_year BETWEEN 2018 AND 2020'
Adding the data type to ColumnMeta allows for better filtering of a dataset, currently all fields don't have a datatype listed. For Pubmed this means every column is a string, filtering by year for example is now only possible like this
year = '2022'
. Adding data types to columns of which we know they are integers allows for filtering with a range, likeyear > 2000