bioatlas / ala-docker

Dockerized version of Atlas of Living Australia components
GNU Affero General Public License v3.0
12 stars 12 forks source link

Trouble with biocache-backend #33

Closed IuriGarcia closed 5 years ago

IuriGarcia commented 5 years ago

In the first place, thank you all for this work. It's really a big thing. I'm having some problem trying to execute in my dev station the biocache components. Biocachebackend exits code 0 and no logs is displayed. I have a docker-compose.yml to build and run Cassandra, Solr, and biocaches images, already downloaded the .war and .zips but don't know what is happening in biocachebackend.

shahmanash commented 5 years ago

The biocachebackend is the image for running the biocache-cli tool. It is not a service and so when you start the services with docker-compose up it exits without error. It gets instantiated but exits as it does not have any running process.

To run the biocache CLI , say to ingest the data, you would run an instance of the service like docker-compose run --rm biocachebackend ash

Note that in the above command , the service name is biocachebackend , which is as defined in the docker-compose.yml file, the option --rm removes the container once you exit from it and ash is the command line shell (equivalent to bash), as the docker image is based on Alpine Linux.

The above command would provide you access to the shell where you can execute the biocache command.

The image doesn't contain the necessary biocache-config.properties files , so a configured property file needs to mounted into the container https://github.com/bioatlas/ala-docker/blob/beta/docker-compose.yml#L117

Also, the biocache CLI requires the lucene nameindex while ingesting the occurrence data, so the necessary volume containing the lucene index needs to be mounted into this path /data/lucene/namematching as done here https://github.com/bioatlas/ala-docker/blob/beta/docker-compose.yml#L114

The configuration of biocache CLI for different components like cassandra database, SOLR and other services can be done from the config property file.

mskyttner commented 5 years ago

There are some notes showing usage at https://bioatlas.github.io, see these web slides that focus on data management: https://bioatlas.github.io/data-mgmt

The tool that is used for ingesting data can be launched with this command:

docker-compose run --rm biocachebackend biocache

This asciicast shows steps related to ingestion of one dataset: https://bioatlas.github.io/ingest/

IuriGarcia commented 5 years ago

Thank you so much Messrs. I'll be checking on the data management instructions soon! Already tested the docker-compose run --rm biocachebackend biocache command and it seens just fine. Thank you for the instructions!

IuriGarcia commented 5 years ago

I'm trying to find a way to test if all the parts of biocache are communicating correctly.

IuriGarcia commented 5 years ago

There are some notes showing usage at https://bioatlas.github.io, see these web slides that focus on data management: https://bioatlas.github.io/data-mgmt

The tool that is used for ingesting data can be launched with this command:

docker-compose run --rm biocachebackend biocache

This asciicast shows steps related to ingestion of one dataset: https://bioatlas.github.io/ingest/

Ingestion seens to depend of an already set up data set. I want to know everything i could do with the biocabackend CLI and it's purpose.

mskyttner commented 5 years ago

There is documentation about data management here:

https://github.com/AtlasOfLivingAustralia/documentation/wiki/Resume

About the CLI, you can see the source code for it here and that way you could learn everything there is to know about it:

https://github.com/AtlasOfLivingAustralia/biocache-store/tree/master/src/main/scala/au/org/ala/biocache/tool

IuriGarcia commented 5 years ago

There is documentation about data management here:

https://github.com/AtlasOfLivingAustralia/documentation/wiki/Resume

About the CLI, you can see the source code for it here and that way you could learn everything there is to know about it:

https://github.com/AtlasOfLivingAustralia/biocache-store/tree/master/src/main/scala/au/org/ala/biocache/tool

Erhm, no higher level description? I'm not having good understanding only reading the source code.

mskyttner commented 5 years ago

@shahmanash do you have pointers to more docs on data management, some slides from the CLI presentation you gave some time ago or so?

shahmanash commented 5 years ago

A few materials available on the topic

IuriGarcia commented 5 years ago

Thank you so much! @mskyttner and @shahmanash !!

IuriGarcia commented 5 years ago

Since we are talking about Biocachebackend, 1 more question: why int the solr6-cassandra3 branch, you are not putting sds-layer into the image?

shahmanash commented 5 years ago

The sds-layers.tgz file is quite specific to Australia, you can check it here https://biocache.ala.org.au/archives/layers/sds-layers.tgz . So it was removed from the subsequent images.

However, the sensitive data handling can be achieved through the Sensitive Data Service module like here https://sds.ala.org.au/ or here https://sds.nbnatlas.org/