datalad / datalad-usage-dashboard

Dashboard of detected usages of DataLad
MIT License
4 stars 2 forks source link

Find datasets on GIN #15

Closed mih closed 9 months ago

mih commented 3 years ago

Maybe like this: https://gin.g-node.org/explore/data?q=datalad+AND+dataset+AND+id&stype=3

Technically, all GIN repositories are datasets, but these have an ID

yarikoptic commented 3 years ago

@jwodder would you be so kind to add GIN as well?

jwodder commented 3 years ago

@mih Does GIN have an API for getting datasets, or do I need to parse HTML? Also, is there any point to filtering with the given query if everything on GIN is a dataset?

yarikoptic commented 3 years ago

See https://github.com/G-Node/gin-cli/issues/125#issuecomment-348435939 (didn't try since then)

jwodder commented 3 years ago

@yarikoptic I'm getting a "503 Service Unavailable" when trying to query the search endpoint.

yarikoptic commented 3 years ago

unfortunately according to https://github.com/G-Node/gin-cli/issues/125#issuecomment-885090857 the discovered new end point https://github.com/gogs/docs-api/tree/master/Repositories#search-repositories is not yet capable of searching through the content of the archives, so we would not be able to find datalad datasets ATM :-/ filed https://github.com/gogs/gogs/issues/6594 to see if there is hope

yarikoptic commented 10 months ago

@jwodder , lets just add all public repositories found on GIN, groupping similarly by the "organization" as on github.

jwodder commented 9 months ago

@yarikoptic What exactly do you want the output for the GIN repositories to look like? The only relevant data for GIN repositories seems to be owner, name, URL, star count, and whether they're active/gone.

jwodder commented 9 months ago

@yarikoptic Problem: GIN's repository search endpoint is consistently returning a 500 error when requesting page 7 (https://gin.g-node.org/api/v1/repos/search?page=7).

yarikoptic commented 9 months ago

dang... please file an issue at https://github.com/G-Node/gogs/issues .

the "funny" thing is that page 8 seems ok https://gin.g-node.org/api/v1/repos/search?page=8 . So we can continue ignoring 5xx until we get an empty result like in case of https://gin.g-node.org/api/v1/repos/search?page=888888

@yarikoptic What exactly do you want the output for the GIN repositories to look like? The only relevant data for GIN repositories seems to be owner, name, URL, star count, and whether they're active/gone.

yeap, sounds right. The same groupping by owner like on github so we could see most "prolific" ones.

jwodder commented 9 months ago

@yarikoptic Issue filed: https://github.com/G-Node/gogs/issues/148

jwodder commented 9 months ago

@yarikoptic Do you want my PR to include the data for the GIN repositories found by my test run, or should that wait until the code is actually "used"?

yarikoptic commented 9 months ago

It could include, I don't mind at all -- would also how right away on what we are after ;)