Open davmlaw opened 3 weeks ago
@TheMadBug has the new germline uploader become much more strict??
from classification.models import ClassificationModification, ShareLevel
from django.db.models import Q
from sync.models import SyncDestination
from sync.shariant.shariant_upload import ClassificationUploader, SyncDestination, QueryJsonFilter
for sd in SyncDestination.objects.filter(config__direction='upload', enabled=True):
already_sync_q = Q(classification__classificationmodification__classificationmodificationsyncrecord__run__destination=sd)
uploader = ClassificationUploader(sd)
qs = ClassificationModification.objects.filter(is_last_published=True, share_level__in=ShareLevel.DISCORDANT_LEVEL_KEYS, classification__lab__group_name__in=uploader.lab_mappings.keys())
q = QueryJsonFilter.classification_value_filter().convert_to_q(uploader.filters)
prev_not_current_sync_qs = qs.filter(already_sync_q).exclude(q).distinct()
current_sync_qs = qs.filter(q).distinct()
print(f"{sd} - current sync: {current_sync_qs.count()}, historical not current: {prev_not_current_sync_qs.count()}")
lab_records = ','.join([cm.classification.lab_record_id for cm in prev_not_current_sync_qs])
print(f"Lab records: {lab_records}")
shariant_upload - current sync: 157, historical not current: 20
Lab records: vc15228,vc13072,vc13075,vc13113,vc15801,vc32032,vc36255,vc36511,vc29980,vc23033,vc15750,vc15833,vc23136,vc15230,vc50244,vc50339,vc51761,vc51760,vc51556,vc30255
shariant_upload_somatic - current sync: 7, historical not current: 2
Lab records: vc15911,vc15731
The shariant_upload_somatic ones are the wrong ones that got withdrawn
Yes, the somatic/germline filter was updated to exclude records that haven't provided an allele origin at all (there's talk to make it mandatory within SA Path but we still need to organise an official convo with the lab heads about that).
re shariant_upload - current_sync_count=156, including historical sync would add: 2391
are 2391 recors for SA Path without an allele origin?
@TheMadBug - I updated the counts, I think I missed up the queries - they are much lower now
Given that current upload sync total is 157 + 7, it looks like we're missing a lot of historical records that we have uploaded
This looks to be part of the original query (before any sync dest filters are applied) - I'm not sure if they are due to labs changing or whatever, but it's possible they are being updated and wouldn't be sent/updated
If a Classification is changed such that the latest ClassificationModification is no longer caught by the Syncrunner, it won't send an update. An example is changing to Somatic, then to not-somatic for the somatic Shariant upload
I think if a sync destination has ever sent a classification it should be responsible for it for all time. So need to add it to filters
in sync.shariant.variant_grid_upload.VariantGridUploadSyncer.records_to_sync
Instead of:
qs = qs.filter(q)
You go:
Dave to go back to SA Path and check if this has happened to any existing records and report back here