databio / bedms

tool for standardization of genomics/epigenomics metadata
BSD 2-Clause "Simplified" License
3 stars 0 forks source link

`AttrStandardizer` can't take a `peppy.Project` obejct #16

Closed nleroy917 closed 1 month ago

nleroy917 commented 1 month ago

Right now it looks like the AttrStandardizer class just fetches the PEP from pephub:

def fetch_from_pephub(pep: str) -> pd.DataFrame:
    """
    Fetches metadata from PEPhub registry.

    :param str pep: Path to the PEPhub registry containing the metadata csv file
    :return pd.DataFrame: path to the CSV file on the local system.
    """
    phc = PEPHubClient()
    project = phc.load_project(pep)
    sample_table = project.sample_table
    csv_file_df = pd.DataFrame(sample_table)
    return csv_file_df

csv_file = fetch_from_pephub(pep)

But, what if the PEP is not on PEPhub? Then I can't standardize. Or what if we are already on PEPhub (like in the /standardize endpoint?) Thats a little inefficient, and I think it would be better if this just took a peppy.Project object instead.

saanikat commented 1 month ago

fetch_from_pephub is also for situations where you would want to standardize the metadata locally instead of using PEPhub.

I can add the following function:

def fetch_pep(pep):
      """
      Fetches the metadata locally from peppy.Project. 
      """
      sample_table =peppy.Project.from_pep_config("path/to/project/sample_sheet.csv")

And then we could have the user specify the kind of path:

attr_standardizer(path="LOCAL/PEPhub", pep=/path/to/pep, schema="ENCODE" )
saanikat commented 1 month ago

Solved in new PR.

@khoroshevskyi added the function get_any_pep