Closed LeeTL1220 closed 9 years ago
Note that easy re-annotation of TCGA MAF (second bullet point), could just be to cut the columns inside oncotator.
FYI... @kcibul and @ldgauthier
I have already discussed with @kcibul and he seems to be on board.
After discussion with Chip @chipstewart the solution to support M2 in FH is as follows:
--infer-onps
(uses full datasource set per @kcibul request)infer-onps
and NEW functionality that collapses xNP fields. How to collapse fields for xNPs: Assume that a field comes in as "x|y|...". For counts:
For other fields: just take the mean
Sounds good.
For my own reference (unit tests still to be created) only for the TCGA MAF to TCGA MAF component. Backing code mostly built as of this writing (not shown here) and all regression tests pass, so far.
Still needed to be built:
--prune-tcga-maf-cols
is specified. This will only be in TcgaMafOutputRenderer. Requires refactoring of the header to annotation mapping paradigm.--help
andthrows warning if input is not TCGAMAF or SIMPLE_TSV)Additional unit/automated tests to write:
MutationDataFactoryTest
RunSpecFactoryTest (if time permits)
TcgaMafOutputRendererTest
AnnotatorTest
When ready, I am going to cut a release with this functionality, only, for GDAC (@dheiman).
The rest of the M2 requirements can be found in issue #326
The filter cannot handle values with "|". These are generated in the columns that the OxoG filter needs in the case of ONPs (e.g. t_alt_count), since there are different values for each base. Oncotator behavior is correct, but we need more functionality to make sure that an option exists to run a TCGA MAF through the OxoG filter -- with minimal changes to the oxoG filter code.
There are several possible solutions for this issue:
If one of the latter two are chosen, this is not really an oncotator issue, but we can track it here. Note that only the third choice handles all edge cases (e.g. we have a DNP where one of the two mutations is an artifact)