IGS / portal_client

Python-based client for downloading data made available through portals powered by the GDC-based portal system..
MIT License
18 stars 17 forks source link

The HMP portal produces a confusing manifest.tsv and metadata.tsv files #31

Open Leytoncito opened 1 month ago

Leytoncito commented 1 month ago

Hi,

Sorry for opening this thread here, I know it's not directly related to the tool, but I hope you can help me out.

I’m trying to download assembled metagenomes from the HMP database. The issue is that when applying filters in the portal, I get manifest and metadata files that are confusing and don’t match the applied filters. For example, the manifest file contains over 20,000 download links, apparently for all available files, and the same happens with the metadata file.

I’ve tried filtering both files to generate a consistent manifest, but I found inconsistencies between them. In the metadata file, the sample_id is unique, but in the manifest file, that same sample_id repeats for multiple samples (with different body_site), which makes it impossible to apply metadata filters to the manifest.

Do you know if there is a FULL.tsv file that includes detailed metadata information to generate a coherent manifest?

Any help would be greatly appreciated.

Thanks and best regards,

Benjamin Leyton

Leytoncito commented 1 month ago

image For example, the filter results in files from other sites, which are not the ones I'm selecting.