ebi-gene-expression-group / atlas-web-single-cell

Single Cell Expression Atlas web application
Apache License 2.0
5 stars 5 forks source link

Add an AnnData experiment to our local SCXA #340

Open ke4 opened 1 year ago

ke4 commented 1 year ago

After a discussion with Pedro the suitable anndata experiment for our local environment is with the accession: E-ANND-3. It can be found here: /nfs/production/irene/ma/anndata-ingest/datasets/tabula_sapiens/E-ANND-3/*

Steps:

  1. Go to a folder in your local machine where you would like to download the file bundle
  2. Download the experiment file bundles to the local machine: scp -r codon-login:/nfs/production/irene/ma/sc_experiments/E-ANND-3 . You have to do this unfortunately, as these experiments are not available yet on our FTP. This download should be less than 1 hour.
  3. As the idf file is missing from that file bundle for now (@pmb59 said that they are going to fix this), you can download it from here: https://gitlab.ebi.ac.uk/ebi-gene-expression/scxa-metadata/-/blob/feature/add_E-ANND-3/ANND/E-ANND-3/E-ANND-3.idf.txt. Please put it into the root folder of the experiment (E-ANND-3).
  4. Rename the umap.tsv to E-ANND-3.umap.tsv.
  5. Create a temp container with mounting the PostgreSQL data volume: docker container create --name pgvol -v scxa_atlas-data-exp:/atlas-data/exp ubuntu:jammy
  6. Copy the file bundles into the PostgreSQL volume: docker cp E-ANND-3 pgvol:/atlas-data/exp/magetab/
  7. Add E-ANND-3 to your test-data.env file under docker/prepare-dev-environment folder
  8. Run the Postgres step: ./docker/prepare-dev-environment/postgres/run.sh -r -l pg-anndata.log
  9. Execute the PostgreSQL step to add the experiment's data to the DB: SCHEMA_VERSION=latest \ docker compose --env-file ./docker/dev.env \-f ./docker/docker-compose-postgres.yml \ up
  10. To add the experiment's metadata execute the Solr step: ./docker/prepare-dev-environment/solr/run.sh -r -l solr.log
ke4 commented 1 year ago

I started to work on this ticket. I started to update its description regarding the dataset. I am going to update this ticket as I go along with this experiment to load our local environment.

ke4 commented 1 year ago

At my 1st run of the PostgreSQL step I had a couple of missing files error:

2023-04-24 13:06:54.176  INFO 154 --- [           main] .a.e.a.c.e.CreateUpdateExperimentCommand : Starting loading/updating experiments:
2023-04-24 13:06:54.177  INFO 154 --- [           main] .a.e.a.c.e.CreateUpdateExperimentCommand : Loading E-ANND-3
2023-04-24 13:06:55.543 ERROR 154 --- [           main] .a.e.a.c.e.CreateUpdateExperimentCommand : Could not load E-ANND-3 due to java.nio.file.NoSuchFileException: /atlas-data/scxa/magetab/E-ANND-3/E-ANND-3.idf.txt
2023-04-24 13:06:55.543  WARN 154 --- [           main] u.a.e.a.cli.AbstractPerAccessionCommand  : 1 experiments failed

and also these lines of errors:

E-ANND-3: Matrix file /atlas-data/scxa/magetab/E-ANND-3/E-ANND-3.aggregated_filtered_normalised_counts.mtx.gz missing, exiting.
ls: cannot access '/atlas-data/scxa/magetab/E-ANND-3/E-ANND-3.tsne*.tsv': No such file or directory
ls: cannot access '/atlas-data/scxa/magetab/E-ANND-3/E-ANND-3.umap*.tsv': No such file or directory
[04/24/2023 13:06:56]      Clusters: Create data file for E-ANND-3...
Error in fread(opt$clusters_path, header = TRUE, check.names = FALSE,  : 
  File '/atlas-data/scxa/magetab/E-ANND-3/E-ANND-3.clusters.tsv' does not exist or is non-readable. getwd()=='/root/db-scxa/bin'
Execution halted
scxa-postgres:5432 - accepting connections
ls: cannot access '/atlas-data/scxa/magetab/E-ANND-3/E-ANND-3.marker_genes_*.tsv': No such file or directory
ke4 commented 1 year ago

@pmb59 told me that for now we can download the missing idf file from this gitlab repo:

https://gitlab.ebi.ac.uk/ebi-gene-expression/scxa-metadata/-/blob/feature/add_E-ANND-3/ANND/E-ANND-3/E-ANND-3.idf.txt

I discussed it with him that it is in a feature branch and it should be merged ASAP into the repo's main branch (master, I think)

ke4 commented 1 year ago

We still have these missing files according to the log:

  1. ...normalised_counts.mtx.gz --> Not resolved yet!!!
  2. ...tsne*.tsv --> this specific experiment file bundle does not have tsne file(s)
  3. ...umap*.tsv --> the bundle has a umap.tsv, we assume with @pmb59 that we have to rename it to E-ANND-3.umap.tsv
  4. ...clusters.tsv --> the bundle has a clusters_for_bundle.txt file, I assume that we have to rename it to E-ANND-3.clusters.tsv
  5. ...marker_genes_*.tsv --> Not resolved yet!!!

This is the current state. I still need more info from the curators/bioinformaticians...

ke4 commented 1 year ago

We have the files ready at this location if you join codon:

/nfs/production/irene/ma/sc_experiments_failed/E-ANND-2

but E-ANND-2 is too big to be able to use it for local dev environment.

Now we are waiting for @pmb59 and/or @irisdianauy to fix the files for E-ANND-2. That is currently the smallest dataset from anndata experiments.

ke4 commented 1 year ago

The cell counts in the various anndata experiments:

ke4 commented 1 year ago

I am going to move this task to the next sprint as I really hope that we are going to get the relevant files from the data prod / curation team in the next 2 weeks.

ke4 commented 1 year ago

I tried to load E-ANND-3 experiment locally by using the files from /nfs/production/irene/ma/sc_experiments/E-ANND-3 . It looks like the E-ANND-3.clusters.tsv is not containing the correct data.

I got this error:

[06/20/2023 11:21:10]   Copying cell groups data to the db...
ERROR:  null value in column "value" violates not-null constraint
DETAIL:  Failing row contains (1966, E-ANND-3, 149, null).
CONTEXT:  COPY scxa_cell_group, line 130: "E-ANND-3|149|"
Cell groups  write failed

After @alfonsomunozpomer helped me investigating this error, it looks like that the 3rd column should be numerical in E-ANND-3.clusters.tsv file as in the other experiments, but it is containing this value: type i pneumocyte.

ke4 commented 1 year ago

I asked @YalanBi and @irisdianauy to look into this issue.

ke4 commented 1 year ago

Data production team need to reanalyse E-ANND-3 and E-ANND-4 experiments. Currently there is a problem with the cluster text file.

ke4 commented 10 months ago

@irisdianauy notified me that the files for the test loading of E-ANND-3 are available now in /nfs/production/irene/ma/sc_experiments/E-ANND-3.

ke4 commented 10 months ago

I successfully loaded E-ANND-3 into my dev env, but in the web app the result is not coming up as there is no data for plotTypesAndOptions in the JSON content on the HTML code. I did some investigation and there is no data in the scxa_dimension_reduction table. I also looked at the DB loading log : ls: cannot access '/atlas-data/exp/magetab/E-ANND-3/E-ANND-3.umap*.tsv': No such file or directory. After a discussion with Iris it looks like there should be a umap related data file, but for some reason it is not in the experiment's folder. Iris is investigating relating this bug.

ke4 commented 9 months ago

@irisdianauy Fixed the above mentioned UMAP data file problem and that file is already provided with the other data files. Loading E-ANND-3 experiment to my local environment was successful. Please follow the steps in the ticket's description.

upendrakumbham commented 7 months ago

Hi @ke4, After loading Anndata experiment ('E-ANND-3') into my local DB. I want to highlight a few corrections to the above steps.

Please let me know if you want me to update these steps.