deepghs / cheesechaser

Swiftly get tons of images from indexed tars on Huggingface
https://cheesechaser.deepghs.org/
Apache License 2.0
27 stars 0 forks source link

how to get the image id of metadata, seems is not match the file name numbers #16

Open AhBumm opened 1 week ago

AhBumm commented 1 week ago

i want to get tag_string of the image 3578712.webp with match media_asset.id == 3578712

PS Q:\dataset\as109> python dantags.py
Column: created_at, Value: 2019-08-30T02:08:44.315-04:00, Type: object
Column: uploader_id, Value: 163843, Type: int64
Column: score, Value: 4, Type: int64
Column: source, Value: https://hhug0.artstation.com/projects/B2XE6, Type: object
Column: md5, Value: fda33108e40c20ea9b969529596c4b79, Type: object
Column: last_comment_bumped_at, Value: None, Type: object
Column: rating, Value: q, Type: object
Column: image_width, Value: 1920, Type: int64
Column: image_height, Value: 1487, Type: int64
Column: tag_string, Value: 1girl boots breasts cleavage commentary_request green_eyes hair_ornament hhug0 highres jacket lips long_hair nail_polish original photoshop_(medium) spanish_commentary twintails vehicle_interior weapon, Type: object
Column: fav_count, Value: 6, Type: int64
Column: file_ext, Value: jpg, Type: object
Column: last_noted_at, Value: None, Type: object
Column: parent_id, Value: nan, Type: float64
Column: has_children, Value: False, Type: bool
Column: approver_id, Value: nan, Type: float64
Column: tag_count_general, Value: 13, Type: int64
Column: tag_count_artist, Value: 1, Type: int64
Column: tag_count_character, Value: 0, Type: int64
Column: tag_count_copyright, Value: 1, Type: int64
Column: file_size, Value: 653027, Type: int64
Column: up_score, Value: 4, Type: int64
Column: down_score, Value: 0, Type: int64
Column: is_pending, Value: False, Type: bool
Column: is_flagged, Value: False, Type: bool
Column: is_deleted, Value: True, Type: bool
Column: tag_count, Value: 19, Type: int64
Column: updated_at, Value: 2020-06-06T20:59:45.012-04:00, Type: object
Column: is_banned, Value: False, Type: bool
Column: pixiv_id, Value: nan, Type: float64
Column: last_commented_at, Value: None, Type: object
Column: has_active_children, Value: False, Type: bool
Column: bit_flags, Value: 2, Type: int64
Column: tag_count_meta, Value: 4, Type: int64
Column: has_large, Value: True, Type: bool
Column: has_visible_children, Value: False, Type: bool
Column: media_asset.id, Value: 3578712, Type: int64
Column: media_asset.created_at, Value: 2019-08-30T02:08:44.315-04:00, Type: object
Column: media_asset.updated_at, Value: 2023-02-27T16:45:24.877-05:00, Type: object
Column: media_asset.md5, Value: fda33108e40c20ea9b969529596c4b79, Type: object
Column: media_asset.file_ext, Value: jpg, Type: object
Column: media_asset.file_size, Value: 653027, Type: int64
Column: media_asset.image_width, Value: 1920, Type: int64
Column: media_asset.image_height, Value: 1487, Type: int64
Column: media_asset.duration, Value: nan, Type: float64
Column: media_asset.status, Value: active, Type: object
Column: media_asset.file_key, Value: 8qsRX5tfw, Type: object
Column: media_asset.is_public, Value: True, Type: bool
Column: media_asset.pixel_hash, Value: c51465fe691e814c463f15bcb2461bd6, Type: object
Column: media_asset.variants.0.type, Value: 180x180, Type: object
Column: media_asset.variants.0.url, Value: https://cdn.donmai.us/180x180/fd/a3/fda33108e40c20ea9b969529596c4b79.jpg, Type: object
Column: media_asset.variants.0.width, Value: 180.0, Type: float64
Column: media_asset.variants.0.height, Value: 139.0, Type: float64
Column: media_asset.variants.0.file_ext, Value: jpg, Type: object
Column: media_asset.variants.1.type, Value: 360x360, Type: object
Column: media_asset.variants.1.url, Value: https://cdn.donmai.us/360x360/fd/a3/fda33108e40c20ea9b969529596c4b79.jpg, Type: object
Column: media_asset.variants.1.width, Value: 360.0, Type: float64
Column: media_asset.variants.1.height, Value: 279.0, Type: float64
Column: media_asset.variants.1.file_ext, Value: jpg, Type: object
Column: media_asset.variants.2.type, Value: 720x720, Type: object
Column: media_asset.variants.2.url, Value: https://cdn.donmai.us/720x720/fd/a3/fda33108e40c20ea9b969529596c4b79.webp, Type: object
Column: media_asset.variants.2.width, Value: 720.0, Type: float64
Column: media_asset.variants.2.height, Value: 558.0, Type: float64
Column: media_asset.variants.2.file_ext, Value: webp, Type: object
Column: media_asset.variants.3.type, Value: sample, Type: object
Column: media_asset.variants.3.url, Value: https://cdn.donmai.us/sample/fd/a3/sample-fda33108e40c20ea9b969529596c4b79.jpg, Type: object
Column: media_asset.variants.3.width, Value: 850.0, Type: float64
Column: media_asset.variants.3.height, Value: 658.0, Type: float64
Column: media_asset.variants.3.file_ext, Value: jpg, Type: object
Column: tag_string_general, Value: 1girl boots breasts cleavage green_eyes hair_ornament jacket lips long_hair nail_polish twintails vehicle_interior weapon, Type: object
Column: tag_string_character, Value: , Type: object
Column: tag_string_copyright, Value: original, Type: object
Column: tag_string_artist, Value: hhug0, Type: object
Column: tag_string_meta, Value: commentary_request highres photoshop_(medium) spanish_commentary, Type: object
Column: file_url, Value: https://cdn.donmai.us/original/fd/a3/fda33108e40c20ea9b969529596c4b79.jpg, Type: object
Column: large_file_url, Value: https://cdn.donmai.us/sample/fd/a3/sample-fda33108e40c20ea9b969529596c4b79.jpg, Type: object
Column: preview_file_url, Value: https://cdn.donmai.us/180x180/fd/a3/fda33108e40c20ea9b969529596c4b79.jpg, Type: object
Column: media_asset.variants.4.type, Value: original, Type: object
Column: media_asset.variants.4.url, Value: https://cdn.donmai.us/original/fd/a3/fda33108e40c20ea9b969529596c4b79.jpg, Type: object
Column: media_asset.variants.4.width, Value: 1920.0, Type: float64
Column: media_asset.variants.4.height, Value: 1487.0, Type: float64
Column: media_asset.variants.4.file_ext, Value: jpg, Type: object
Column: media_asset.variants.5.type, Value: None, Type: object
Column: media_asset.variants.5.url, Value: None, Type: object
Column: media_asset.variants.5.width, Value: nan, Type: float64
Column: media_asset.variants.5.height, Value: nan, Type: float64
Column: media_asset.variants.5.file_ext, Value: None, Type: object

It seems not a same image

narugo1992 commented 4 days ago

@AhBumm hmmm, why are you using media_asset.id to pick image? danbooru has its ID, and our dataset is using this ID, not media_asset.id