SACGF / variantgrid

VariantGrid public repo
Other
23 stars 2 forks source link

Cytogenetics - storing variants with uncertainty #891

Open davmlaw opened 1 year ago

davmlaw commented 1 year ago

Cyto have a lot of data. It's currently stored in NxClinical in ISCN format

They use microarray probes that tile the genome, so data has inherit uncertainty - ie between these 2 probes, though sometimes they will export the data to be ISCNs w/o uncertainty, using say midpoint between 2 probes

If we could bring in cyto data, it would be useful for SA Path and Shariant. To do that, we would need to support uncertainty.

I uncertainty is a fundamental part of Variant, ie the same start/end but with different uncertainty is different.

outer_start = IntegerField(null=True)
outer_end = IntegerField(null=True)

You need to be able to store no uncertainty and unknown (ie ?), maybe magic numbers to avoid another boolean field, ie:

outer_start = null # no uncertainty (default in almost all cases)
outer_start = -1 # unknown
outer_start = 4231312 # the coordinate
davmlaw commented 1 year ago

http://atlasgeneticsoncology.org/

I took some notes here: https://github.com/SACGF/shariant-admin/issues/142

Sent email summarising results 28 July

We'd need to support ISCN

There is a Python PIP package Git Repo

pip install ISCNSNAKE

Also HGVS uncertainty - need to modify biocommons to support it:

https://github.com/biocommons/hgvs/issues/331 - support uncertain offsets https://github.com/biocommons/hgvs/issues/225 - support uncertain ranges

davmlaw commented 1 year ago

Hopefully adding some null fields doesn't add that much space, but we should def do a lot of benchmarks to see what the overhead here is making such a large change to Variant table