genome-nexus / genome-nexus-annotation-pipeline

Library and tool for annotating MAF files using Genome Nexus Webserver API
MIT License
8 stars 27 forks source link

Store immutable version of (chrom,pos,ref,alt1,alt2) in separate columns with prefix #157

Open inodb opened 3 years ago

inodb commented 3 years ago

We have had a lot of issues trying to reproduce the original chrom/pos/ref/alt1/alt2 after normalizing it in the genome nexus annotation pipeline. Basically imagine the use case of running genome-nexus-annotation-pipeline twice in a row on the same MAF file. When there is an error in the normalization of chrom/pos/ref/alt1/alt2, we are not able to reproduce the original chrom/pos/ref/alt1/alt2. Let's store a copy as a separate column that's immutable to get around this issue

We could have the following logic:

If the columns prefixed with genome_nexus_ignore_original are missing (genome_nexus_ignore_original_Chromosome genome_nexus_ignore_original_Start_Position genome_nexus_ignore_original_End_Position genome_nexus_ignore_original_Reference_Allele genome_nexus_ignore_original_Tumor_Seq_Allele1 genome_nexus_ignore_original_Tumor_Seq_Allele2) then create these columns and copy over the values from the inputted MAF Chromosome Start_Position End_Position Reference_Allele Tumor_Seq_Allele1

If they are not missing then use the columns prefixed with genome_nexus_ignore_original to annotate them instead of the inputted Chromosome Start_Position End_Position Reference_Allele Tumor_Seq_Allele1 Tumor_Seq_Allele2. We do this because we might have normalized the non-prefixed ones incorrectly before.

Bonus points: note that if we really want to manually change some genome_nexus_ignore_original ones, we can add another column prefix like genome_nexus_ignore_original_manual_override. That one would be preferred over genome_nexus_ignore_original. That way we can continue to keep the immutable column truly immutable and also see which records we manually override.

sheridancbio commented 3 years ago

@inodb I mentioned in the PR that user as1000 is not yet registered in git as a user in the development team and so the jenkins unit tests for this code base [which run inside our firewall] are refusing to run. If this PR needs to be merged at some point down the road, we will need to adjust the development team or re-open the PR from a trusted developer account.