Open oplantalech opened 4 days ago
Hi @oplantalech, from a quick check, it seems the loader is filtering out the rows with null SV_Status during import? Need to double check on that but we think it may be better to report an error rather than defaulting to somatic. This PR is good to go once the unit test is fixed.
@rmadupuri In this line you can see how it adds null
when the SV_Status has a NA
value. That's the problem. The validator suggests that this null
is replaced by "SOMATIC"
.
I will fix the tests.
@averyniceday Just had a call with the data team and we think the loader might have to be fixed for structural variants regarding SV_Status
. This is a required column, with only two possible values (SOMATIC
and GERMLINE
), but it looks like the loader allows to load empty values as null
(see this line). That is a different behavior that what we see in the mutation data, where required fields should be filled or otherwise the loader breaks. Do you know maybe why this was introduced, or if we should avoid touching the loader?
Following the documentation, SV_Status column of structural variant files can only contain SOMATIC or GERMLINE values. The validator assumes if SV_Status is empty, that the value will be SOMATIC. In practice, the loader loads the entry with an empty value and causes some parts to break (e.g. Structural Variant table in the Study View is empty).
I'm proposing to raise an error in the validator when the value is empty, and still allow to load studies with empty SV_Status by only using the loader. However, if that is really something we want to avoid, I can also change the PR and adjust the loader to follow the logic described by the validator (@rmadupuri and team let me know).