WGLab / CancerVar

Clinical interpretation of somatic mutations in cancer
42 stars 13 forks source link

Error of opai_predictor.py using evidence-based model #23

Open kakyungkim opened 7 months ago

kakyungkim commented 7 months ago

I met the error for running opai_predictor.py using evidence-based model as below.


Traceback (most recent call last): File "OPAI/scripts/opai_predictor.py", line 92, in main() File "OPAI/scripts/opai_predictor.py", line 58, in main disNet.load_state_dict( torch.load( model_path, map_location=torch.device(device) ) ) File "../anaconda3/envs/opai/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1483, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for evs_CNN1d_5layer: size mismatch for linear1.0.weight: copying a param with shape torch.Size([2, 13000]) from checkpoint, the shape in current model is torch.Size([2, 8400]).

It might be due to size mismatch runtime error when trying to load a PyTorch Model, so I tried to from change strict: bool = True to strict: bool = False of 'load_state_dict' function; however, I met the same error. Can you help me to solve the problem? Thanks!

quanliustc commented 7 months ago

when you run with the default example files, did they have the same issue? And also please provide your running cmds.

kakyungkim commented 7 months ago

I have same issue with the default example files.

My running cmd is as follow:

# Run CancerVar
python CancerVar.py -c config.ini

# Run OPAI using Ensemble-based model
python OPAI/scripts/feature_preprocess.py -a example/FDA.hg19_multianno.txt.grl_p -c example/FDA.hg19_multianno.txt.cancervar -m ensemble -n 5 -d OPAI/saves/nonmissing_db.npy -o example/FDA.hg19_multianno.txt.cancervar.ensemble.csv
python OPAI/scripts/opai_predictor.py -i  example/FDA.hg19_multianno.txt.cancervar.ensemble.csv -m ensemble -c OPAI/saves/ensemble.pt -d cpu -v example/FDA.hg19_multianno.txt.cancervar -o example/FDA.hg19_multianno.txt.cancervar.ensemble.pred

# Run OPAI using Evidence-based model
python OPAI/scripts/feature_preprocess.py -a example/FDA.hg19_multianno.txt.grl_p -c example/FDA.hg19_multianno.txt.cancervar -m evs -n 5 -d OPAI/saves/nonmissing_db.npy -o example/FDA.hg19_multianno.txt.cancervar.evs.csv
python OPAI/scripts/opai_predictor.py -i  example/FDA.hg19_multianno.txt.cancervar.ensemble.csv -m evs -c OPAI/saves/ensemble.pt -d cpu -v example/FDA.hg19_multianno.txt.cancervar -o example/FDA.hg19_multianno.txt.cancervar.evs.pred

When running the last command, I found the error I mentioned.

My config.ini file is as follow:

[CancerVar]
buildver = hg19
# hg19
inputfile = example/FDA_hg19.av
# the inputfile and the path  example/test.av hg19_clinvar_20151201.avinput
# tab-delimited will be better for including the other information
inputfile_type = AVinput
# the input file type VCF(vcf file with single sample),AVinput,VCF_m(vcf file with multiple samples)
outfile = example/FDA
# the output file location and prefix of output file
database_cancervar = cancervardb
# the database location/dir for Intervar
lof_genes = cancervardb/LOF.genes.exac_me_cancers
mim2gene = cancervardb/mim2gene.txt
mim_pheno = cancervardb/mim_pheno.txt
mim_orpha = cancervardb/mim_orpha.txt
orpha = cancervardb/orpha.txt
knowngenecanonical = cancervardb/knownGeneCanonical.txt
exclude_snps = cancervardb/ext.variants
cancervar_markers=cancervardb/cancervar.out.txt
cancer_pathway=cancervardb/cancers_genes.list_kegg.txt
cancers_genes=cancervardb/cancer_census.genes
cancers_types=cancervardb/cancervar.cancer.types
evidence_file = None
# add your own Evidence file for each Variant:
# evidence file as tab-delimited,format like this:
# Chr Pos Ref_allele Alt_allele  Evidence_list
disorder_cutoff = 0.01
#Allele frequency is greater than expected for disorder
[CancerVar_Bool]
onetranscript = FALSE
# TRUE or FALSE: print out only one transcript for exonic variants (default: FALSE/all transcripts)
otherinfo = TRUE
# TRUE or FALSE: print out otherinfo (infomration in fifth column in queryfile,default: TRUE)
# We want use the fifth column to provide the cancer types,
# this option only perform well with AVinput file,and the other information only can be put in the fifth column.  The information in >5th column will be lost.
# When input as  VCF or VCF_m files with otherinfo option, only het/hom will be kept, depth and qual will be lost, the cancer type should be provide by command option.
[Annovar]
annovar_tool = annovar
convert2annovar = ../annovar/convert2annovar.pl
#convert input file to annovar format
table_annovar = ../annovar/table_annovar.pl
#
annotate_variation=  ../annovar/annotate_variation.pl
#
database_locat = humandb
# the database location/dir from annnovar   check if database file exists
database_names = refGene esp6500siv2_all 1000g2015aug avsnp147 dbnsfp30a clinvar_20190305 exac03 dbscsnv11 dbnsfp31a_interpro ensGene knownGene cosmic70 icgc21 gnomad_genome
# specify the database_names from ANNOVAR or UCSC
[Other]
current_version = CancerVar_20200119
# pipeline version
public_dev = https://github.com/WGLab/CancerVar/releases