VAST-AI-Research / TripoSR

MIT License
4.47k stars 510 forks source link

Evaluate dataset licensing #43

Open fire opened 7 months ago

fire commented 7 months ago

Can you release the curated cc-by dataset?

mr-lab commented 7 months ago

put more effort will you . it's clearly objaverse . just look at "Dataset used to train" on huggingface .

fire commented 7 months ago

Some of the cc-by licensed artwork in objaverse are incorrectly licensed so I wanted to check.

fire commented 7 months ago

I have to go for now but I'll be working on a script to get a CC-BY csv with chatgpt.

# Work in progress
# Import necessary libraries
import pandas as pd
from objaverse.xl import objaverse_xl as oxl

def save_cc_by_licenses_as_csv(download_dir="~/.objaverse", output_file="cc_by_licenses.csv"):
    """
    Download annotations from Objaverse-XL and save entries with CC-BY licenses to a CSV file,
    using fileIdentifier as the unique identifier for each 3D object.

    Parameters:
    download_dir (str): Directory to cache the downloaded annotations. Defaults to "~/.objaverse".
    output_file (str): The name of the output CSV file. Defaults to "cc_by_licenses.csv".
    """

    # Download annotations
    annotations = oxl.get_annotations(download_dir=download_dir)

    # Filter for CC-BY licenses
    cc_by_annotations = annotations[annotations['license'] == 'CC-BY']

    # Ensure 'fileIdentifier' is used as a reference for each object
    # You might already have it directly from the annotations, this step is just to clarify its importance
    cc_by_annotations = cc_by_annotations[['fileIdentifier', 'source', 'license', 'fileType', 'sha256', 'metadata']]

    # Save to CSV
    cc_by_annotations.to_csv(output_file, index=False)
    print(f"Saved CC-BY licensed objects to {output_file} using fileIdentifier as the unique identifier.")

# Call the function
if __name__ == "__main__":
    save_cc_by_licenses_as_csv()