bbglab / boostdm-pipeline

Learning pipeline to identify somatic SNVs under positive selection.
Other
0 stars 0 forks source link

BoostDM | `vep.tsv.gz` missing regions #3

Closed FedericaBrando closed 6 months ago

FedericaBrando commented 7 months ago

as of now, vep.tsv.gz is produced by:

This scripts calls another [script]() that calls:

python ${SCRIPT_FOLDER}/mutations.py -r ${CDS} -o ${tmpdir}/mutations.tsv

The cds of reference is cds.regions.tsv.

Since vep.tsv.gz is used by IntOGen only in the SaturationStep (IntOGen-BoostDM connection) and in BoostDM in these steps:

We have two options we can follow:

  1. step:

    • Instead of cds.regions.tsv as reference we can use the canonical.regions.gz. [easy-fix]
  2. step:

    • Check in the pipeline if there is the need to re-vep the mutations since we already have the vepped mutations step in IntOGen [long-term discussion]
FedericaBrando commented 7 months ago

Running build for vep.tsv.gz file with canonical.regions.gz as reference.

FedericaBrando commented 6 months ago

New vep.tsv.gz available. Now including 25bp for splicing regions.