sandbox for full gbif stack

gbif / portal-feedback

User feedback for the GBIF API, website and published data. You can ask questions here. 🗨❓

30 stars 16 forks source link

sandbox for full gbif stack #5388

Open davidsean opened 3 months ago

davidsean commented 3 months ago

Not sure where to post this question,

I would like to test how archives are displayed on the front-facing GBIF app.

For this, the data needs to travel through the GBIF stack:

IPT
crawler
front-end

Is there a sandbox testing environment where I can publish a mock-archive and see how it is rendered/ingested? I don't need to play with the data beyond the front-end (no need for snapshots on AWS for example).

I know about the data validator, I want to go beyond that (mostly front-end, perhaps eventually testing the clustering).

MattBlissett commented 3 months ago

Very quickly before I leave work for today, the testing environment is at https://www.gbif-uat.org/

You can set up your own IPT in test mode to publish to it, or request an account on https://ipt.gbif.org/

The test environment only has a small amount of data, and therefore the clustering isn't very useful. (It might not be enabled...)

davidsean commented 3 months ago

great! I will point my test IPT to it and start playing around

EDIT: my test IPT is not accessible from outside, so and ask for a test account.

davidsean commented 3 months ago

On a related note, is this stack difficult to spin up on a self hosted environment (similar to the IPT in a docker container). note: I just want a single node/organization/network/installation setup (with multiple datasets).

The database that feeds in the IPT is on a private network. The source multimedia (movies/images/audio) I want to test (with multimedia extension) are also hosted behind a private network.

MattBlissett commented 3 months ago

For the test account please email helpdesk@gbif.org

There isn't an easy way to set up GBIF's infrastructure. It's a big-data system, requiring a Hadoop cluster (ZooKeeper, HDFS, YARN, Oozie, Spark), ElasticSearch, PostgreSQL and Java webservices.

A similar, but smaller, system is the Living Atlas, though it's usually used by a single node hosting data from multiple organizations in their country: https://living-atlases.gbif.org/