PoisonAlien / maftools

Summarize, Analyze and Visualize MAF files from TCGA or in-house studies.
http://bioconductor.org/packages/release/bioc/html/maftools.html
MIT License
439 stars 217 forks source link

readin error for gzipped maf #468

Closed ruolin closed 4 years ago

ruolin commented 4 years ago

I just started to use R maftools. It is a great package, however there seems to be a bug when reading gzipped maf. For example, using this public file from TCGA https://portal.gdc.cancer.gov/files/995c0111-d90b-4140-bee7-3845436c3b42 Version: maftools_2.0.16

reading using maftools yields 75811 records nrow(tcga.brca.somatic.maf@data)

however using fread yields 120988 records. I have seen some SNV being assigned to wrong samples using maftools. This is a critical problem since it will affect everything.

PoisonAlien commented 4 years ago

Hi, Thanks for using maftools and I am glad you find it useful. Regading the issue, MAF object includes only non-synonymous variants under @data slot, rest synonymous goes under @maf.silent slot. If you add up both it should be equal to your original number of rows.

> nrow(brca@maf.silent) + nrow(brca@data)
120988
ruolin commented 4 years ago

Great. Thanks.