UCSF-Costello-Lab / LG3_Pipeline

The original LG3 pipeline
https://github.com/UCSF-Costello-Lab/LG3_Pipeline
0 stars 0 forks source link

RESOURCES: Resource files used #158

Open HenrikBengtsson opened 2 years ago

HenrikBengtsson commented 2 years ago

List of /resources files used by the pipeline:

$ grep -h /resources *.out | sed 's/^[ ]*-[ ]*//g' | sed 's/^[^=]*=[ ]*//g' | grep -vE "^(.Align|Arguments:|INFO|======)" | sort -u
/cbc2/data2/henrik/repositories/UCSF-CostelloLab/LG3_Pipeline-next-release/resources/1000G_biallelic.indels.hg19.sorted.vcf
/cbc2/data2/henrik/repositories/UCSF-CostelloLab/LG3_Pipeline-next-release/resources/All_exome_targets.extended_200bp.bed
/cbc2/data2/henrik/repositories/UCSF-CostelloLab/LG3_Pipeline-next-release/resources/All_exome_targets.extended_200bp.interval_list
/cbc2/data2/henrik/repositories/UCSF-CostelloLab/LG3_Pipeline-next-release/resources/all_human_kinases.txt
/cbc2/data2/henrik/repositories/UCSF-CostelloLab/LG3_Pipeline-next-release/resources/bwa_indices/hg19.bwa
/cbc2/data2/henrik/repositories/UCSF-CostelloLab/LG3_Pipeline-next-release/resources/CosmicMutantExport_v58_150312.tsv
/cbc2/data2/henrik/repositories/UCSF-CostelloLab/LG3_Pipeline-next-release/resources/dbsnp_132.hg19.sorted.vcf
/cbc2/data2/henrik/repositories/UCSF-CostelloLab/LG3_Pipeline-next-release/resources/RefSeq.Entrez.txt
/cbc2/data2/henrik/repositories/UCSF-CostelloLab/LG3_Pipeline-next-release/resources/SangerCancerGeneCensus_2012-03-15.txt
/cbc2/data2/henrik/repositories/UCSF-CostelloLab/LG3_Pipeline-next-release/resources/UCSC_HG19_Feb_2009/hg19.fa
/cbc2/data2/henrik/repositories/UCSF-CostelloLab/LG3_Pipeline-next-release/resources/UCSC_hg19/hg19.fa
/home/jocostello/shared/LG3_Pipeline_HIDE/resources/All_exome_targets.extended_200bp.interval_list

This only searches for /resources so there might be more.

Note regarding C4: These files are all accessible from C4.

See also Issues #10, #31, and #61.

HenrikBengtsson commented 2 years ago

Consolidating information reference files from Issue #10:

Below are the software versions and various refs I found for the steps in the pipeline. Most came from logs and some came from tracing the scripts that were called, so it may not be complete.

[ ... ]

Align: BWA_INDEX=/home/shared/cbc/software_cbc/LG3_Pipeline-2018-10-11/resources/bwa_indices/hg19.bwa

[ ... ]

Recal:

* REF=/home/shared/cbc/software_cbc/LG3_Pipeline-2018-10-17/resources/`UCSC_HG19_Feb_2009/hg19.fa`

* THOUSAND=/home/shared/cbc/software_cbc/LG3_Pipeline-2018-10-17/resources/`1000G_biallelic.indels.hg19.sorted.vcf`

* DBSNP=/home/shared/cbc/software_cbc/LG3_Pipeline-2018-10-17/resources/`dbsnp_132.hg19.sorted.vcf`

[ ... ]

ILIST=/home/jocostello/shared/LG3_Pipeline_HIDE/resources/SeqCap_EZ_Exome_v3_capture.interval_list

MutDet:

[ ... ]

* REF=/home/shared/cbc/software_cbc/LG3_Pipeline-2018-10-17-patched/resources/`UCSC_HG19_Feb_2009/hg19.fa`

* DBSNP=/home/shared/cbc/software_cbc/LG3_Pipeline-2018-10-17-patched/resources/`dbsnp_132.hg19.sorted.vcf`

* REORDER=/home/shared/cbc/software_cbc/LG3_Pipeline-2018-10-17-patched/scripts/vcf_reorder.py

* CONVERT=/home/shared/cbc/software_cbc/LG3_Pipeline-2018-10-17-patched/resources/`RefSeq.Entrez.txt`

* KINASEDATA=/home/shared/cbc/software_cbc/LG3_Pipeline-2018-10-17-patched/resources/`all_human_kinases.txt`

* COSMICDATA=/home/shared/cbc/software_cbc/LG3_Pipeline-2018-10-17-patched/resources/`CosmicMutantExport_v58_150312.tsv`

* CANCERDATA=/home/shared/cbc/software_cbc/LG3_Pipeline-2018-10-17-patched/resources/`SangerCancerGeneCensus_2012-03-15.txt`

Interval = /home/jocostello/shared/LG3_Pipeline_HIDE/resources/All_exome_targets.extended_200bp.interval_list

Pindel:

[ ... ]

* REF=/home/shared/cbc/software_cbc/LG3_Pipeline-2018-10-17-patched/resources/`UCSC_HG19_Feb_2009/hg19.fa`

* TARGET=/home/shared/cbc/software_cbc/LG3_Pipeline-2018-10-17-patched/resources/`All_exome_targets.extended_200bp.bed`

* KINASEDATA=/home/shared/cbc/software_cbc/LG3_Pipeline-2018-10-17-patched/resources/`all_human_kinases.txt`

* COSMICDATA=/home/shared/cbc/software_cbc/LG3_Pipeline-2018-10-17-patched/resources/`CosmicMutantExport_v58_150312.tsv`

* CANCERDATA=/home/shared/cbc/software_cbc/LG3_Pipeline-2018-10-17-patched/resources/`SangerCancerGeneCensus_2012-03-15.txt`

* CONVERT=/home/shared/cbc/software_cbc/LG3_Pipeline-2018-10-17-patched/resources/`RefSeq.Entrez.txt`

* ANNDB=/home/shared/cbc/software_cbc/LG3_Pipeline-2018-10-17-patched/`AnnoVar/hg19db`/
HenrikBengtsson commented 2 years ago

Info from Issue #31

Going to use this thread to scribble down when I discover other hardcoded paths to reference files but also to scripts (as the below example also includes):

$ head -10 FilterMutations/mutationConfig.cfg
[General]
genomeBuild = hg19

[AnnoVarInputs]
executable = /home/jocostello/shared/LG3_Pipeline/AnnoVar/annotate_variation.pl
dbDir = /home/jocostello/shared/LG3_Pipeline/AnnoVar/hg19db/
snpDBs = snp132
1kgDBs = 1000g2010nov_all,1000g2011may_all

[SNPRemovalFilters]
HenrikBengtsson commented 2 years ago

Info from Issue #61:

Reference files that I found in the code:

## scripts/Align_fastq.sh
BWA_INDEX=${LG3_HOME}/resources/bwa_indices/hg19.bwa

## runs_demo/_run_Recal
ILIST=${ILIST:-${LG3_HOME}/resources/SeqCap_EZ_Exome_v3_capture.interval_list}

## scripts/Recal_bigmem.sh
REF=${LG3_HOME}/resources/UCSC_HG19_Feb_2009/hg19.fa
THOUSAND=${LG3_HOME}/resources/1000G_biallelic.indels.hg19.sorted.vcf
DBSNP=${LG3_HOME}/resources/dbsnp_132.hg19.sorted.vcf

## scripts/Germline.sh
REF=${LG3_HOME}/resources/UCSC_HG19_Feb_2009/hg19.fa
DBSNP=${LG3_HOME}/resources/dbsnp_132.hg19.sorted.vcf

## pindel_all.pbs
REF=${LG3_HOME}/resources/UCSC_HG19_Feb_2009/hg19.fa
TARGET=${LG3_HOME}/resources/All_exome_targets.extended_200bp.bed

## scripts/mutdet_submit.sh
CONFIG=${LG3_HOME}/FilterMutations/mutationConfig.cfg
INTERVAL=${LG3_HOME}/resources/All_exome_targets.extended_200bp.interval_list

## scripts/MutDet.sh
REF="${LG3_HOME}/resources/UCSC_HG19_Feb_2009/hg19.fa"
DBSNP="${LG3_HOME}/resources/dbsnp_132.hg19.sorted.vcf"
CONVERT="${LG3_HOME}/resources/RefSeq.Entrez.txt"
KINASEDATA="${LG3_HOME}/resources/all_human_kinases.txt"
COSMICDATA="${LG3_HOME}/resources/CosmicMutantExport_v58_150312.tsv"
CANCERDATA="${LG3_HOME}/resources/SangerCancerGeneCensus_2012-03-15.txt"

Did I miss something?