Clinical-Genomics / BALSAMIC

Bioinformatic Analysis pipeLine for SomAtic Mutations In Cancer
https://balsamic.readthedocs.io/
MIT License
44 stars 16 forks source link

Using Genotype to detect mixups #1088

Open mathiasbio opened 1 year ago

mathiasbio commented 1 year ago

Is your feature request related to a problem? Please describe.

For sample mix-ups detected with somalier previously genotype has been used to find the corresponding mixed case. Such as in one mixed up T/N analysis, the tumor data was sent to Genotype and a match was found with an older germline case analysed in MIP. Deviation issue: https://github.com/Clinical-Genomics/Deviations/issues/449

Linked to this project: https://github.com/Clinical-Genomics/project-planning/issues/384

Describe the solution you'd like

  1. The project mentioned above suggests creating some functionality for automatically uploading data for a case to Genotype to look for matches with the MAF data, and determine or throw some warning I suppose, that a mixup may have occurred.

This is something that can happen in mip however and I don't think something like that should be developed within the balsamic context, but seems like a more general tool.

  1. Within the context of balsamic it could be something that is done automatically when somalier detects a mixup. But this happens pretty rarely and in that case I think it can just be done manually. In that case I would simply recommend updating Atlas with instructions for how to proceed with this in balsamic delivery when somalier detects a mixup.

Describe alternatives you've considered If possible, a clear and concise description of any alternative solutions or features you've considered.

Additional context If possible, add any other context or screenshots about the feature request here.

Expected output for the feature If possible, an example of expected output

Current BALSAMIC version balsamic --version 11.0.2

pbiology commented 1 year ago

Are there other "safety measures" we should consider implementing? If so, perhaps we could try to combine these into a single new feature.

ivadym commented 1 year ago

I believe part of this has already been implemented and we are currently uploading automatically WGS cases to genotype:

https://github.com/Clinical-Genomics/BALSAMIC/issues/882 https://github.com/Clinical-Genomics/cg/pull/1555

What do you think it's missing here @mathiasbio?

mathiasbio commented 1 year ago

At the moment I made the issue I didn't know that balsamic uploaded germline calls to Genotype, and I think I may have misunderstood the problem. This issue was created based on Moa's comment in https://github.com/Clinical-Genomics/Deviations/issues/449

2023-01-25 There is a project proposal to use Genotype to match with already existing samples in order to check that samples are unique before upload, https://github.com/Clinical-Genomics/project-planning/issues/384. Such a check would have discovered that sample ACCXXXXX match with ACXXXXX. (Note that the time period when running match in Genotype had to be longer than 2 months in order to include the conflicting sample.)

I'm not sure exactly how this Genotype check is implemented for balsamic right now. If I understood it correctly the germline calls on the normal sample from DNAscope is uploaded to Genotype, and when the MAF results come in the results are compared on the basis of their common LIMS-ID.

I think what Moa suggested was to run Genotype and look for matches after the balsamic analysis is completed, and before upload to Scout and Caesar. I think the point of this was to detect mixups that has occurred over a longer span of time, so that for example: Sample A --> MIP + MAF Three weeks later: Sample B --> Balsamic + MAF Before upload of Sample B to Scout, it has a chance to match with the MAF results from Sample A and block upload and trigger a deviation.