PrivateBin / Directory

Rust based directory application to collect list of federated instances
https://privatebin.info/directory/
Other
25 stars 8 forks source link

static list of instances, eg; instances.json #23

Closed cbluth closed 3 years ago

cbluth commented 3 years ago

i am interested in querying the instance directory url to retrieve the list of instances and their attributes (url, rating, https, fileupload, location, etc), perhaps something like curl https://privatebin.info/directory/instances.json

right now, the only way is to parse the html with something custom or something like htmltab from https://github.com/flother/htmltab, like curl -s https://privatebin.info/directory/ | htmltab

elrido commented 3 years ago

I'd be potentially willing to even share the raw data (sqlite db file), as long as it doesn't impact the privacy of the monitored services. What would you want to use it for? Would it be occasionally pulled (up to 1 time per day) or integrated into a web service as client side data source? Is there something that could be fixed in terms of UI of the current site or a page to drill down into the data in more detail (say, a latency or uptime graph per instance) that would cover the use case that you'd need the data for? I'd consider extending the service with features that I can see to be useful, but for other use cases would be more motivated to provide the raw data (as long as we only consider the same data that is already rendered, it will be less resourced on the service to deliver a JSON than render XHTML).

cbluth commented 3 years ago

I have a terminal privatebin client here: https://github.com/cbluth/pbin (see hosts.go)

My intention is to extend this project and introduce logic to cache this "instances.json" locally, instead of embedding it directly in the client. I would probably refresh the local json file any time the client request times out.

elrido commented 3 years ago

Oh, I see the potential: And I could also use this for instance discovery in the paste manager (https://github.com/PrivateBin/PrivateBin/issues/2). Ok, JSON it is.

PS: I added your client to the list in the wiki.

elrido commented 3 years ago

Further thoughts:

cbluth commented 3 years ago

the current behaviour of my pbin client is as follows:

in practice, the behaviour is such that it uploads to a random instance each time, i did this for two reasons, to spread the load and to pick a reasonably close instance.

regarding the list of features/options for each instance, i am already building an html parser to grab the features that are enabled for each instance, by isolating the <nav> dom, and extracting the enabled features from the html, i wish there was an easier way to do it, i havent figured that one out yet.

regarding the json-ld api, maybe that could be considered an "extended goal", just a simple static json with list of instances would help tons.

...also, the paste manager is a good idea, i will toy with the idea of implementing one for pbin, and perhaps give it a un/lock feature.

elrido commented 3 years ago

The initial implementation got done, re-using the same cached list used to display the instances in HTML. The result is randomized for load balancing and you can get the top 25 as follows:

curl --header "Accept: application/json" https://privatebin.info/directory/api?top=25

I've added some docs to the about-page on it. Will leave this issue open and update it when more options got implemented.

cbluth commented 3 years ago

good stuff, i will give it a go, thanks!

elrido commented 3 years ago

I've implemented a number of additional filters. For your client you may be interested to limit the version (to "1." or "1.3."), set a minimum uptime or request instances for a certain country. All of them are documented in the about page, linked above.

cbluth commented 3 years ago

@elrido if youre interested, i made a script to check the options of each server in the directory by parsing the html, my client relies on knowing if an individual server has specific features enabled or not, for example my client takes the -never, -year and -opendiscussion arguments, and these features are not listed in the directory.

here is the script: https://anonpaste.org/?51ca17d59d23384e#8FvqogDrhZTVbd7s7UCvBSPCwtqZH41mjbHpHdB9mVnM

here is the json file it produces: https://anonpaste.org/?77f9cdaf83ea6b7b#21ULFsNzCTHyB5ubPbssStWFU1ZnEbAcx7mNLYZDQeJv

after pinging the directory api for the top 25, my client will ping each server in the list for the enabled options, and cache the enabled options locally. and if my client encounters a timeout or missing server, it will get from the api again and re-ping, and cache locally. that way i dont need to embed the list of servers inside my client