Open jjgao opened 6 years ago
Sounds good. We could link to the official MAF documentation, https://docs.gdc.cancer.gov/Data/File_Formats/MAF_Format/ for the explanation of column names, and focus this document on the relevant columns for cBioPortal functionality.
What are the columns required for genome nexus annotation? And will Genome Nexus annotation always be done, or only when specific fields are missing?
@sandertan Good idea to point to the GDC MAF document and focus our doc on cBioPortal functionality.
GN only requires 5 columns of genomic changes as minimal input. (we should probably have NCBI_Build
as well -- currently only support GRCh37/hg19)
A MAF should be ran through GN annotation at least once to normalize the annotations.
I think there might be some flag to not annotate, so CMO can force their annotations over genome nexus, but that might only be part of pipelines' code. @angelicaochoa do u know?
@inodb MAFs will be annotated with genome nexus on the fly if column HGVSp_Short
is not present in the file. The CMO does not pre-annotate MAFs so unless someone does this manually then they will always undergo annotation as part of import pipeline
What is CMO? So annotation will be done during the import process, not when the user requests data in the front-end?
@jjgao what do you mean with
A MAF should be ran through GN annotation at least once to normalize the annotations.
And what does this mean for private installations?
@jjgao does this mean we should start recommending using GN instead of VCF2MAF/VEP for the annotation step? If GN can be assumed to be a "given" at some point (i.e. we make it a dependency for cBioPortal), then I think the mutation data format could be indeed simplified since the extra annotation will happen either at the time of import or on the fly in the cBioPortal platform itself.
@pieterlukasse @sandertan
CMO means Center for Molecular Oncology -- it's our department. We have a lot of internal data coming from CMO.
Currently annotation is done either when the data is prepared or imported.
Our pipelines have switched to GN, but VCF2MAF is perfectly fine at this moment for annotating MAF. At some point, we should recommend using GN, but maybe after we refactor the GN annotation? @inodb
Also to clarify, when I said "A MAF should be ran through GN annotation at least once to normalize the annotations", I mean "MAFs should go through the same annotation process, e.g. same canonical isoforms, for an instance of cBioPortal."
@jjgao thanks for the clarification. Coming back to the first point you mentioned in this ticket:
- Document the minimal MAF that is required for annotating through Genome Nexus (instead of importing into the database)
I don't think this would be correct, unless the GN step can be assumed to always run during import step.
@pieterlukasse the minimal MAF should be the same with either GenomeNexus or VCF2MAF. I think the users should not be asked to prepare the full MAF, which is a non-trivial barrier. We should document more clearly that with a minimal MAF, they can run GenomeNexus or VCF2MAF to get the fully annotated MAF for importing.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
@inodb maybe we can take this under genome nexus and have better documentation about annotation variants with genome nexus?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
The MAF format in the (current documentation)(https://cbioportal.readthedocs.io/en/latest/File-Formats.html) is too complex and not accurate. For example, I think we should always require the genomic changes columns.
Maybe we improve it by:
@inodb @pieterlukasse @sandertan