Open jylee-bcm opened 2 months ago
Test000 (s3://aim-test-data/test000) failed with
Caused by:
Process `ANNOTATE_BY_MODULES (chr1)` terminated with an error exit status (1)
Command executed:
feature.py \
-patientHPOsimiOMIM omim_sim.tsv \
-patientHPOsimiHGMD hgmd_sim.tsv \
-varFile chr1.vcf-vep.txt \
-inFileType vepAnnotTab \
-patientFileType one \
-genomeRef hg19 \
-diseaseInh AD \ -modules curate,conserve
mv scores.csv chr1.vcf-vep_scores.csv
Command exit status:
1
Command output:
input file: chr1.vcf-vep.txt
type of input file: vepAnnotTab
modules: curate,conserve
modules list: ['curate', 'conserve']
patientHPOsimi-OMIM dimension: (6393, 7)
patientHPOsimi-HGMD dimension: (346526, 6)
reading DGV flat file
finsihed reading DGV
reading Decipher flat file
finsihed reading DECIPHER
input annoatated varFile: chr1.vcf-vep.txt
shape: (0, 735)
found GERP++RS
found GERP++NR
pipeline time: 5.037638425827026
log file name: log.txt
input read time: 0.09162592887878418
input num rows: 0
m: ['curate', 'conserve']
Score re-calculation:
Command error:
input file: chr1.vcf-vep.txt
type of input file: vepAnnotTab
modules: curate,conserve
modules list: ['curate', 'conserve']
patientHPOsimi-OMIM dimension: (6393, 7)
patientHPOsimi-HGMD dimension: (346526, 6)
reading DGV flat file
finsihed reading DGV
reading Decipher flat file
finsihed reading DECIPHER
input annoatated varFile: chr1.vcf-vep.txt
shape: (0, 735)
found GERP++RS
found GERP++NR
pipeline time: 5.037638425827026
log file name: log.txt
input read time: 0.09162592887878418
input num rows: 0
m: ['curate', 'conserve']
Score re-calculation:
/home/sunyoung/tmp/tmpf9kb21es/bin/feature.py:194: DtypeWarning: Columns (0) have mixed types. Specify dtype o
ption on import or set low_memory=False.
dgvDf = pd.read_csv(fileName, sep=",")
/home/sunyoung/tmp/tmpf9kb21es/bin/feature.py:260: FutureWarning: The error_bad_lines argument has been deprec
ated and will be removed in a future version. Use on_bad_lines in the future.
varDf = pd.read_csv(
Traceback (most recent call last):
File "/home/sunyoung/tmp/tmpf9kb21es/bin/feature.py", line 423, in <module> ] main() File "/home/sunyoung/tmp/tmpf9kb21es/bin/feature.py", line 411, in main score = load_raw_matrix(annotateInfoDf) File "/home/sunyoung/tmp/tmpf9kb21es/bin/annotation/marrvel_score_recalc.py", line 90, in load_raw_matrix return score.loc[:, raw_features].copy() File "/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py", line 961, in __getitem__ return self._getitem_tuple(key) ' File "/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py", line 1149, in _getitem_tuple h return self._getitem_tuple_same_dim(tup) m File "/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py", line 827, in _getitem_tuple_same_dim t retval = getattr(retval, self.name)._getitem_axis(key, axis=i) File "/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py", line 1191, in _getitem_axis o return self._getitem_iterable(key, axis=axis) n File "/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py", line 1132, in _getitem_iterable
keyarr, indexer = self._get_listlike_indexer(key, axis)
File "/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py", line 1327, in _get_listlike_indexer
keyarr, indexer = ax._get_indexer_strict(key, axis_name)
File "/usr/local/lib/python3.8/dist-packages/pandas/core/indexes/base.py", line 5782, in _get_indexer_strict
self._raise_if_missing(keyarr, indexer, axis_name)
File "/usr/local/lib/python3.8/dist-packages/pandas/core/indexes/base.py", line 5845, in _raise_if_missing
raise KeyError(f"{not_found} not in index")
KeyError: "['chrom', 'pos', 'varId', 'varId_dash', 'zyg', 'geneSymbol', 'geneEnsId', 'gnomadAF', 'gnomadAFg', ''symptomName', 'omimSymptomSimScore', 'omimSymMatchFlag', 'hgmdSymptomScore', 'hgmdSymptomSimScore', 'hgmdSymMathchFlag', 'clinVarSymMatchFlag', 'gnomadGeneZscore', 'gnomadGenePLI', 'gnomadGeneOELof', 'gnomadGeneOELofUpper', m'omimGeneFound', 'omimVarFound', 'hgmdGeneFound', 'hgmdVarFound', 'clinVarVarFound', 'clinVarGeneFound', 'clinvatrTotalNumVars', 'clinvarNumP', 'clinvarNumLP', 'clinvarNumLB', 'clinvarNumB', 'clinvarSignDesc', 'clinvarConditi on', 'dgvVarFound', 'decipherVarFound', 'curationScoreHGMD', 'curationScoreOMIM', 'curationScoreClinVar', 'conseorvationScoreDGV', 'conservationScoreGnomad', 'conservationScoreOELof', 'hom', 'hgmd_rs', 'clin_dict', 'clin_PLP'n, 'clin_PLP_perc', 'spliceAImax', 'clin_code', 'hgmd_id', 'rsId', 'phenoList', 'phenoInhList'] not in index"
Work dir:
/home/sunyoung/tmp/tmpf9kb21es/work/40/de07ad4c4753d655e16b6a9e874c98
Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run``
-- Check '.nextflow.log' file for details
@arine Thanks for testing! I also found the error exist so just fixed and pushed.
To avoid the situation that the workflow fails to handle vcf files in a wrong format, especially with the header information, which are mostly not used for our workflow at all.
Specifically, our workflow extensively uses both software of
bcftools
andtabix
, but when the header information is in wrong format, most of commands either ofbcftools
ortabix
used to fail.I would like to ask your review about: