aangelopoulos / ppi_py

A package for statistically rigorous scientific discovery using machine learning. Implements prediction-powered inference.
MIT License
205 stars 15 forks source link

load_dataset: error "No schema supplied", maybe missing --id ? #17

Open MatthieuStigler opened 2 months ago

MatthieuStigler commented 2 months ago

Hi

Thank you for making all the code so readily accessible!!

I tried the basic example from galaxy, and ran into the error message:

Invalid URL '1pDLQesPhbH5fSZW1m4aWC-wnJWnp1rGV': No schema supplied

I wonder whether when you call gdown, you should not add the --id argument first? With my version of gdown at least (4.3.1 on Ubuntu), gdown 1pDLQesPhbH5fSZW1m4aWC-wnJWnp1rGV won't work, while gdown --id 1pDLQesPhbH5fSZW1m4aWC-wnJWnp1rGV will!?

Reproducible code:

import os, sys
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), os.pardir)))
import numpy as np
import pandas as pd

from ppi_py.datasets import load_dataset
dataset_folder = "./data/"
data = load_dataset(dataset_folder, "galaxies")

Thanks!

aangelopoulos commented 1 month ago

Hey Matthieu! Unfortunately, this depends on your version of gdown... with a more recent version, you won't get this bad behavior.

If you'd like, maybe you can edit README.md to point out that this is a known bug, and can be fixed by pip install -U gdown or generally updating gdown?

Feel free to send a PR :)