Open fire opened 7 months ago
put more effort will you . it's clearly objaverse . just look at "Dataset used to train" on huggingface .
Some of the cc-by licensed artwork in objaverse are incorrectly licensed so I wanted to check.
I have to go for now but I'll be working on a script to get a CC-BY csv with chatgpt.
# Work in progress
# Import necessary libraries
import pandas as pd
from objaverse.xl import objaverse_xl as oxl
def save_cc_by_licenses_as_csv(download_dir="~/.objaverse", output_file="cc_by_licenses.csv"):
"""
Download annotations from Objaverse-XL and save entries with CC-BY licenses to a CSV file,
using fileIdentifier as the unique identifier for each 3D object.
Parameters:
download_dir (str): Directory to cache the downloaded annotations. Defaults to "~/.objaverse".
output_file (str): The name of the output CSV file. Defaults to "cc_by_licenses.csv".
"""
# Download annotations
annotations = oxl.get_annotations(download_dir=download_dir)
# Filter for CC-BY licenses
cc_by_annotations = annotations[annotations['license'] == 'CC-BY']
# Ensure 'fileIdentifier' is used as a reference for each object
# You might already have it directly from the annotations, this step is just to clarify its importance
cc_by_annotations = cc_by_annotations[['fileIdentifier', 'source', 'license', 'fileType', 'sha256', 'metadata']]
# Save to CSV
cc_by_annotations.to_csv(output_file, index=False)
print(f"Saved CC-BY licensed objects to {output_file} using fileIdentifier as the unique identifier.")
# Call the function
if __name__ == "__main__":
save_cc_by_licenses_as_csv()
Can you release the curated cc-by dataset?