fire-eggs / Danbooru2021

Python scripts and tools for working with the Danbooru2022 data set. Note: this is a sqlite database and a viewer, not directly related to machine learning.
https://www.gwern.net/Danbooru2021
MIT License
42 stars 2 forks source link

Add pools support #3

Closed fire-eggs closed 4 years ago

fire-eggs commented 4 years ago

The database builder does NOT include the image pools data.

Pools data in the metadata set haven't been gathered into tuples. E.g. an image which is part of two pools will have pool data like so:

pools: [123,899,"set","collection"]

This is in fact two pools: pool #123, which is a set, and pool #899, which is a collection.

So need to come up with parsing to handle this.

fire-eggs commented 4 years ago

should be possible to process those using some sort of zip() magic

fire-eggs commented 4 years ago

On consideration, as gwent does not give us the pool names, I'm currently thinking that there is no point in importing the pools data.

However, for future reference, this is how it could be done:

# full is the list as illustrated by example above, i.e. [123,899,"set","collection"]
list1 = full[:len(full)//2] # first half, i.e. ids
list2 = full[len(full)//2:] # second half, i.e. the 'types'
tups = zip(list1,list2)  # now matched tuples, i.e. [(123,"set"),(899,"collection")]
for tup in tups:
    insert or ignore into pools table