ebi-gene-expression-group / atlas-web-single-cell

Single Cell Expression Atlas web application
Apache License 2.0
5 stars 5 forks source link

Update SQL for non-priority case in cell plot #384

Closed lingyun1010 closed 5 months ago

lingyun1010 commented 8 months ago

This a bugfix related to the issue https://github.com/ebi-gene-expression-group/atlas-web-single-cell/issues/374.

In the previous implementation, we suppose at least one highest priority value, i.e. larger than 0, would be curated in the table scxa-dimension-reduction. However, due to lack of enough data or curated by mistakes, we do not always have the highest prioriy value, so I fix the postgres query to select proper entries in both cases.

There are some data populating issues on dev PostgreSQL database, which has been reported to data production team members on Slack. Please follow the updates via https://ebi-fg.slack.com/archives/C7U2CRS58/p1706111909985499.

lingyun1010 commented 7 months ago

@upendrakumbham reported a bug regarding t-SNE plots. The default umap plot parameterisation for experiment E-ANND-3 is n_neighbors:20, however the only entry in the postgres table is n_neighbors:15.

Due to I don't have E-ANND-3 experiment data yet locally, to debug the issue, I temporarily deleted the table entries for the experiment E-EHCA-2 for umap plot methods, including n_neighbors:10,100,20,25,3,30,5,50 and only left n_neighbors:15.

I can generate the same bug locally in this way. To check the default parameterisation value I check the json endpoint http://localhost:8080/gxa/sc/json/cell-plots/E-EHCA-2//default/plot-method and I got

{"tsne":{"perplexity":25},"umap":{"n_neighbors":20}}

In conclusion by now, the bug comes from the backend not frontend issue and we may encounter the same bug if the database has only one entry for one of the cell plot type method.

lingyun1010 commented 7 months ago

I tested the relevant SQL implemented in CellPlotDao.fetchDefaultPlotMethodWithParameterisation as following:

SELECT DISTINCT dr.method, jsonb_array_elements(dr.parameterisation) parameterisation
                    FROM scxa_dimension_reduction dr
                    JOIN (SELECT method,  max(priority) as prt
                        FROM scxa_dimension_reduction
                        WHERE experiment_accession=:experiment_accession
                         GROUP BY method) fi
                    ON dr.method = fi.method
                    AND dr.priority = fi.prt
            ORDER BY parameterisation

where experiment_accession = E-EHCA-2 as my test case.

I get a wrong query response, which contains all the other n_neighbors options. Hence, the bug is from the above SQL.