Graph-Learning-Benchmarks / gli

🗂 Graph Learning Indexer: a contributor-friendly and metadata-rich platform for graph learning benchmarks. Dataloading, Benchmarking, Tagging, and more!
https://graph-learning-benchmarks.github.io/gli/
MIT License
42 stars 20 forks source link

[FEATURE REQUEST] Find a better solution for data files uploading #302

Closed jiaqima closed 2 years ago

jiaqima commented 2 years ago

Is your feature request related to a problem? Please describe. Find a cloud storage service that 1) allows anyone with a link to upload files (something like the Dropbox file request); 2) automatically returns a downloadable link for each file uploaded; 3) has a large download quota (see this discussion).

Additional context See also #301 #246 #248 #293

xingjian-zhang commented 2 years ago

I made some progress on this issue.

TL;DR

Details

The script needs to run continuously. So we might want to run it on a server. Essentially, it does the following jobs:

  1. Connect to the dropbox API
  2. Fetch the list of files in the GLI directory
  3. For each file, check if the file requires an update, if so,
    1. Rename the file
    2. Share the file and get a download link
    3. Save the link to links.json

We can host a webpage to display the links (easy to implement) or mail the links to uploaders (needs investigation), then.

Auto script api.py

import json
import logging as lg
import os
from time import sleep

import dropbox

class GLIDropboxManager:

    def __init__(self, token, separator="_SEP", sleep_time=10):
        self.token = token
        self.separator = separator
        self.sleep_time = sleep_time
        self.dbx = dropbox.Dropbox(token)
        self.links = {}

    def is_path_exist(self, path):
        try:
            self.dbx.files_get_metadata(path)
        except Exception as e:
            lg.error(f"{path} does not exist.")
            return False
        return True

    def is_path_need_update(self, path):
        return self.separator in path

    def update_file_name(self, path):

        if not self.is_path_exist(path):
            return None

        base, ext = os.path.splitext(path)
        if self.is_path_need_update(path):
            new_path = base[:base.find(self.separator)] + ext
            try:
                self.dbx.files_move(path, new_path, autorename=True)
            except Exception as e:
                lg.error("Rename failed.")
                lg.error(e.message)
                return path
            lg.info(f"Rename {path} to {new_path}")
            return new_path
        else:
            lg.warning(f"{path} does not need to be updated.")
            return path

    def get_download_link(self, path):

        if not self.is_path_exist(path):
            return None

        requested_visibility = dropbox.sharing.RequestedVisibility.public
        settings = dropbox.sharing.SharedLinkSettings(
            requested_visibility=requested_visibility)
        try:
            file_link_metadata = self.dbx.sharing_create_shared_link_with_settings(
                path, settings)
        except Exception as e:
            lg.warning(f"{path} already has shared link.")
            file_link_metadata = self.dbx.sharing_create_shared_link(path)
            # TODO - defer - revoke link if it is not public
        return file_link_metadata.url

    def list_files(self):
        paths = [f"/{f.name}" for f in self.dbx.files_list_folder("").entries]
        lg.info(f"Get paths: {paths}")
        return paths

    def run(self):

        for path in self.list_files():
            self.update(path)

        while True:
            for path in self.list_files():
                if self.is_path_need_update(path):
                    self.update(path)
                else:
                    lg.info(f"No need to update for {path}.")
            with open("links.json", "w") as fp:
                json.dump(self.links, fp, indent=4)
            sleep(self.sleep_time)

    def update(self, path):
        name = self.update_file_name(path)
        link = self.get_download_link(name)
        lg.info(f"{name} link: {link}")
        self.links[name] = link

def main():
    with open("config.json", "r") as fp:
        config = json.load(fp)
    token = config["token"]
    separator = config["separator"]
    sleep_time = config["sleep"]
    lg.basicConfig(filename="dbx.log",
                   level=lg.INFO,
                   format="%(asctime)s %(levelname)s: %(message)s")

    manager = GLIDropboxManager(token,
                                separator=separator,
                                sleep_time=sleep_time)
    manager.run()

if __name__ == "__main__":
    main()

Sharing links links.json

{
    "/FB13.npz": "https://www.dropbox.com/s/mnns0sjuxk1if1y/FB13.npz?dl=0",
    "/FB15K.npz": "https://www.dropbox.com/s/iyzljce16u3vas9/FB15K.npz?dl=0",
    "/FB15K237.npz": "https://www.dropbox.com/s/4mpvsp53ru7qtud/FB15K237.npz?dl=0",
    "/NELL_995.npz": "https://www.dropbox.com/s/ub5u2nbmxyokt4b/NELL_995.npz?dl=0",
    "/WN11.npz": "https://www.dropbox.com/s/uceh9rbndcbfzld/WN11.npz?dl=0",
    "/WN18.npz": "https://www.dropbox.com/s/01fw0gipnfofer8/WN18.npz?dl=0",
    "/WN18RR.npz": "https://www.dropbox.com/s/hjcg3qnhwjembmg/WN18RR.npz?dl=0",
    "/YAGO3_10.npz": "https://www.dropbox.com/s/eca07e8ppehhw1r/YAGO3_10.npz?dl=0",
    "/actor.npz": "https://www.dropbox.com/s/j3mgqxkq832rh2y/actor.npz?dl=0",
    "/actor_task.npz": "https://www.dropbox.com/s/49j2hy7262yie6p/actor_task.npz?dl=0",
    "/arxiv_year.npz": "https://www.dropbox.com/s/dwg1zk8a9wdmyt7/arxiv_year.npz?dl=0",
    "/chameleon.npz": "https://www.dropbox.com/s/gs3vus6v1kz82wz/chameleon.npz?dl=0",
    "/chameleon_task.npz": "https://www.dropbox.com/s/fbdw9u7njhp9wz3/chameleon_task.npz?dl=0",
    "/cifar.npz": "https://www.dropbox.com/s/p8mrpd04d2uchz0/cifar.npz?dl=0",
    "/cifar_task.npz": "https://www.dropbox.com/s/r2h0ar1oakdcjbf/cifar_task.npz?dl=0",
    "/citeseer.npz": "https://www.dropbox.com/s/v7azu3xvvc54thw/citeseer.npz?dl=0",
    "/citeseer_task.npz": "https://www.dropbox.com/s/x8zb8o52go17ea7/citeseer_task.npz?dl=0",
    "/cora.npz": "https://www.dropbox.com/s/kcniwnkf47t4uxq/cora.npz?dl=0",
    "/cora_task.npz": "https://www.dropbox.com/s/s0ekd8l7hjxz3hq/cora_task.npz?dl=0",
    "/cornell.npz": "https://www.dropbox.com/s/nxuyei4m7mp77h7/cornell.npz?dl=0",
    "/cornell_task.npz": "https://www.dropbox.com/s/udxnla6s3j7rvex/cornell_task.npz?dl=0",
    "/genius.npz": "https://www.dropbox.com/s/mj84q5gnoeto91o/genius.npz?dl=0",
    "/mnist.npz": "https://www.dropbox.com/s/d9cu81pua3th4df/mnist.npz?dl=0",
    "/mnist_task.npz": "https://www.dropbox.com/s/kw2uy7sldqytun6/mnist_task.npz?dl=0",
    "/ogbg_molbace.npz": "https://www.dropbox.com/s/z7x02e89pkalcde/ogbg_molbace.npz?dl=0",
    "/ogbg_molbace_task.npz": "https://www.dropbox.com/s/qf6r202or6x0y1u/ogbg_molbace_task.npz?dl=0",
    "/ogbg_molclintox.npz": "https://www.dropbox.com/s/zbha5myimurf93j/ogbg_molclintox.npz?dl=0",
    "/ogbg_molclintox_task.npz": "https://www.dropbox.com/s/tsk3i89mep1ljwk/ogbg_molclintox_task.npz?dl=0",
    "/ogbg_molfreesolv.npz": "https://www.dropbox.com/s/82c0nayvagwdvkx/ogbg_molfreesolv.npz?dl=0",
    "/ogbg_molfreesolv_task.npz": "https://www.dropbox.com/s/kdx46s2jdm0cpw1/ogbg_molfreesolv_task.npz?dl=0",
    "/ogbg_molhiv.npz": "https://www.dropbox.com/s/0ytdz9ou7f6nwna/ogbg_molhiv.npz?dl=0",
    "/ogbg_molhiv_task.npz": "https://www.dropbox.com/s/wokku9mxt9ujgj4/ogbg_molhiv_task.npz?dl=0",
    "/ogbg_molmuv.npz": "https://www.dropbox.com/s/138245euc2n7gpj/ogbg_molmuv.npz?dl=0",
    "/ogbg_molmuv_task.npz": "https://www.dropbox.com/s/ilmhcnch5q2zag0/ogbg_molmuv_task.npz?dl=0",
    "/ogbg_molpcba.npz": "https://www.dropbox.com/s/aoz24q9wqlspgtp/ogbg_molpcba.npz?dl=0",
    "/ogbg_molpcba_task.npz": "https://www.dropbox.com/s/5nk5yds40ncpf1e/ogbg_molpcba_task.npz?dl=0",
    "/ogbg_molsider.npz": "https://www.dropbox.com/s/wn5grlahgos7zor/ogbg_molsider.npz?dl=0",
    "/ogbg_molsider_task.npz": "https://www.dropbox.com/s/68cxmfvhblyagrx/ogbg_molsider_task.npz?dl=0",
    "/ogbl_collab.npz": "https://www.dropbox.com/s/8uvkih8eclwu8v4/ogbl_collab.npz?dl=0",
    "/ogbl_collab_task_prestore_neg.npz": "https://www.dropbox.com/s/p4207i2aqzxifsg/ogbl_collab_task_prestore_neg.npz?dl=0",
    "/ogbl_collab_task_runtime_sampling.npz": "https://www.dropbox.com/s/8rkdseccdt8al4p/ogbl_collab_task_runtime_sampling.npz?dl=0",
    "/ogbn_arxiv.npz": "https://www.dropbox.com/s/auea7u7kxh4i4ge/ogbn_arxiv.npz?dl=0",
    "/ogbn_arxiv_task.npz": "https://www.dropbox.com/s/tp278dm3inkvmht/ogbn_arxiv_task.npz?dl=0",
    "/ogbn_mag.npz": "https://www.dropbox.com/s/7oefv0bluyy5sdp/ogbn_mag.npz?dl=0",
    "/ogbn_mag_task.npz": "https://www.dropbox.com/s/whin1oif9z5c799/ogbn_mag_task.npz?dl=0",
    "/ogbn_products.npz": "https://www.dropbox.com/s/nxhp6vuy1av8dqw/ogbn_products.npz?dl=0",
    "/ogbn_products_task.npz": "https://www.dropbox.com/s/krnx44uvl7wktvd/ogbn_products_task.npz?dl=0",
    "/ogbn_proteins.npz": "https://www.dropbox.com/s/nwua3arrzcz54ni/ogbn_proteins.npz?dl=0",
    "/ogbn_proteins_task.npz": "https://www.dropbox.com/s/bynf0novifmvtxi/ogbn_proteins_task.npz?dl=0",
    "/penn94.npz": "https://www.dropbox.com/s/328etuf50oy1976/penn94.npz?dl=0",
    "/pokec.npz": "https://www.dropbox.com/s/kovi85q9faqgnj8/pokec.npz?dl=0",
    "/pubmed.npz": "https://www.dropbox.com/s/wyp44ckp7a2z6ra/pubmed.npz?dl=0",
    "/pubmed_task.npz": "https://www.dropbox.com/s/xz2mqbsyksqw7z8/pubmed_task.npz?dl=0",
    "/snap_patents.npz": "https://www.dropbox.com/s/ers89qyj635j6gc/snap_patents.npz?dl=0",
    "/squirrel.npz": "https://www.dropbox.com/s/pxa32umv4zazo5m/squirrel.npz?dl=0",
    "/squirrel_task.npz": "https://www.dropbox.com/s/3mn6k7vrkk0omyk/squirrel_task.npz?dl=0",
    "/texas.npz": "https://www.dropbox.com/s/septqv2rayfa649/texas.npz?dl=0",
    "/texas_task.npz": "https://www.dropbox.com/s/mylxzsh26xw3k6u/texas_task.npz?dl=0",
    "/twitch_gamers.npz": "https://www.dropbox.com/s/epu3g7hhoev9mmr/twitch_gamers.npz?dl=0",
    "/wiki.npz": "https://www.dropbox.com/s/9iuak0zai45kmmb/wiki.npz?dl=0",
    "/wisconsin.npz": "https://www.dropbox.com/s/y2tq6adffxqfg0u/wisconsin.npz?dl=0",
    "/wisconsin_task.npz": "https://www.dropbox.com/s/72lq93gmhgjlll0/wisconsin_task.npz?dl=0"
}

Questions

xingjian-zhang commented 2 years ago

By the way, here is the link to my file request: https://www.dropbox.com/request/nOBsw4YrkBaP9tBhtMid

jiaqima commented 2 years ago

Deployed the solution discussed above externally.

Upload data files with Dropbox file request: tinyurl.com/glifileupload

Display data file links with a webpage: tinyurl.com/glifilelink

xingjian-zhang commented 2 years ago

Just a reminder: