chanzuckerberg / single-cell-data-portal

The data portal supporting the submission, exploration, and management of projects and datasets to cellxgene.
MIT License
63 stars 12 forks source link

Move all implicit validation steps in the CXG converter into the CXG schema CLI #5753

Closed joyceyan closed 1 year ago

brianraymor commented 1 year ago

Can you say a bit more about why validation for CXG requirements would occur twice?

The CLI is available for curators to use prior to submission BUT the ingestion pipeline runs validation again (trust, but verify). So in theory, datasets that would result in failures during CXG conversion would fail prior to the conversion step.

atarashansky commented 1 year ago

I think we're talking about the same thing @brianraymor. The CXG conversion occurs as part of the ingestion pipeline. So when we say "reused between the CXG schema CLI and CXG converter", we are referring to the validation done by the ingestion pipeline. Apologies for the confusion.

brianraymor commented 1 year ago

Apologies. Still confused.

  1. Dataset is ingested.
  2. cellxgene-schema CLI is run which includes validation related to the CXG.
  3. If 2 is successful, then CXG conversion is started and doesn't need to worry about validating those cases again.

So I do not understand "reused".

atarashansky commented 1 year ago

:facepalm:

You're absolutely correct. I now realize that the processing pipeline directly imports cellxgene_schema. This ticket should be re-titled to: Move all implicit validation steps in the CXG converter into the CXG schema CLI

joyceyan commented 1 year ago

Closing this out since after doing some investigation, @seve didn't find any implicit validations in the CXG conversion code that isn't already in the CLI validator.