fire-eggs / Danbooru2021

Python scripts and tools for working with the Danbooru2022 data set. Note: this is a sqlite database and a viewer, not directly related to machine learning.
https://www.gwern.net/Danbooru2021
MIT License
42 stars 2 forks source link

PostgreSQL support #45

Closed DonaldTsang closed 3 years ago

DonaldTsang commented 3 years ago

Is it possible to save the data into PostgreSQL instead, so that tag embedding can be done and compared between Danbooru and Derpibooru?

Cross-referencing: https://github.com/fire-eggs/Danbooru2019/tree/master/database and https://derpibooru.org/pages/data_dumps Goal: https://www.aclweb.org/anthology/L18-1156.pdf

fire-eggs commented 3 years ago

Certainly! makedb.py creates the database with basic SQL commands. That program can be modified using a Postgres package (e.g. psycopg2) and the appropriate Postgres-specific SQL commands.

Looking at the schema from Derpibooru, it looks like it would be fairly straightforward to use it:

Gwern's metadata doesn't readily permit any of the other tables in the Derpibooru schema to be populated. I'm unsure if you need any of those tables for your goals.

DonaldTsang commented 3 years ago

First, what is the process for converting SQLite to PostgreSQL?

Also I have discovered something about my needs for Tag Implications (and Tag Alias Cleaning):

Tag aliases differ from implications, where both tags remain on the image. In other words, aliases are for tags referring to the same thing, while implications are for situations where one tag describes a subset of the images belonging to another tag.

Am already using DerpiDB for doing Co-occurrence tasks.

fire-eggs commented 3 years ago

History suggests the Danbooru server itself was using PostgreSQL. If that is still the case, asking the Danbooru maintainers for a copy of the database would be the most expeditious method for PostgreSQL support.

I don't have PostgreSQL and don't have the background to support it.

fire-eggs commented 1 year ago

A possible solution: https://github.com/bitdotioinc/pgsqlite