evolgeniusteam / GMrepoProgrammableAccess

programmable access to GM repo
GNU General Public License v3.0
20 stars 12 forks source link

Incomplete curated projects #14

Closed clay-jona closed 1 year ago

clay-jona commented 1 year ago

Hi there,

This is an extension to #8, where getCuratedProjectsList method was added to the API.

Here's the code I used to fetch the list of curated project IDs.

def get_curated_project_ids():
    query = {}
    url = 'https://gmrepo.humangut.info/api/getCuratedProjectsList'
    content = requests.post(url, data=json.dumps(query))

    project_id_set = set([x["project_id"] for x in content.json()])
    return project_id_set

Upon running this code, I manually verified if the curated project IDs are included in the output. For example, PRJEB1775 is a project involving metagenomics samples with diarrhea. However,

pid_set = get_cureated_project_ids()
"PRJEB1775" in pid_set
# False

Is it possible that getCuratedProjectsList returns an incomplete list of project IDs?

ZhuJiaying1998 commented 1 year ago

My apology for the long delay. The PRJEB1775 project in our database only has a single phenotype - "diarrhea", and lacks a control group. Consequently, we did not conduct marker analysis for this project. The term "curated projects" in our database refers to projects for which we have conducted marker analyses.

clay-jona commented 1 year ago

Thanks for getting back to me, @ZhuJiaying1998. Is there a way to get all the projects that are QC status == True? For example, in the "Associated runs" section in the PRJEB1775 page, there is a column called QC status. I'd like to be able to filter out samples that aren't QCed.

ZhuJiaying1998 commented 1 year ago

Currently, there is no corresponding API available to achieve this. However, on the gmrepo help page, we provide download links for all the data in the database (https://evolgeniusteam.github.io/gmrepodocumentation/usage/downloaddatafromgmrepo/). You can obtain the information you need by downloading the "Projects" and "Processed runs" tables.

clay-jona commented 1 year ago

This should work perfectly fine. Thanks so much!