cBioPortal / cbioportal-core

Externalized cBioPortal Core
2 stars 15 forks source link

Validator update: only allow SOMATIC or GERMLINE values for SV_Status column #64

Open oplantalech opened 4 days ago

oplantalech commented 4 days ago

Following the documentation, SV_Status column of structural variant files can only contain SOMATIC or GERMLINE values. The validator assumes if SV_Status is empty, that the value will be SOMATIC. In practice, the loader loads the entry with an empty value and causes some parts to break (e.g. Structural Variant table in the Study View is empty).

I'm proposing to raise an error in the validator when the value is empty, and still allow to load studies with empty SV_Status by only using the loader. However, if that is really something we want to avoid, I can also change the PR and adjust the loader to follow the logic described by the validator (@rmadupuri and team let me know).

rmadupuri commented 3 days ago

Hi @oplantalech, from a quick check, it seems the loader is filtering out the rows with null SV_Status during import? Need to double check on that but we think it may be better to report an error rather than defaulting to somatic. This PR is good to go once the unit test is fixed.

oplantalech commented 3 days ago

@rmadupuri In this line you can see how it adds null when the SV_Status has a NA value. That's the problem. The validator suggests that this null is replaced by "SOMATIC".

I will fix the tests.

oplantalech commented 1 day ago

@averyniceday Just had a call with the data team and we think the loader might have to be fixed for structural variants regarding SV_Status. This is a required column, with only two possible values (SOMATIC and GERMLINE), but it looks like the loader allows to load empty values as null (see this line). That is a different behavior that what we see in the mutation data, where required fields should be filled or otherwise the loader breaks. Do you know maybe why this was introduced, or if we should avoid touching the loader?