bbglab / intogen-plus

a framework for automatic and comprehensive knowledge extraction based on mutational data from sequenced tumor samples from patients.
https://www.intogen.org/search
Other
0 stars 1 forks source link

Redefinition of BoostDM region dataset #31

Closed FedericaBrando closed 2 months ago

FedericaBrando commented 2 months ago

BoostDM saturation vep files have three main problems:

  1. The tabix is skipping systematically the first position of the cds_25bp.regions.tsv.gz
  2. We are introducing entire exons that are formed of non coding.

Tackling the problem:


BoostDM dataset --> cds-5spli.regions.gz

The Biomart query we use in IntOGen has genomic coordinates that will replace the exon coordinates we use in BoostDM regions, this will align the region definition between IntOGen and BoostDM.

We then redefined the splicing region to be 5 bp instead of 25. commit: https://github.com/bbglab/intogen-plus/commit/dc3c9cc974549cec8970f1dfda6878fd42c7e0a8

DriverSaturation step

New run of the saturation step was done, tackling the issue of the first position error.

commit: https://github.com/bbglab/intogen-plus/commit/9580b1a437030ce35cb712c5020b8b027b8b93dc

FedericaBrando commented 2 months ago

waiting for Ferran check to run BoostDM