MathOnco / NeoPredPipe

Neoantigens prediction pipeline for multi- or single-region vcf files using ANNOVAR and netMHCpan.
GNU Lesser General Public License v3.0
100 stars 28 forks source link

IndexError in MakeTempFastas #17

Closed wangyqhaha closed 4 years ago

wangyqhaha commented 4 years ago

I'm so glad to find your great software NEOPREDPIPE to predict tumor neoantigens easliy. However, I got an error when running the NeoPredPipe,the bug shows as following .I have checked my reformat fasta file , but I still don't know why list index is out of range.Can you help me?

INFO: Annovar reference files of build hg19 were given, using this build for all analysis. INFO: Begin. INFO: Running convert2annovar.py on /company/Liver/input_vcf/Liver.vcf INFO: ANNOVAR VCF Conversion Process complete /company/Liver/input_vcf/Liver.vcf INFO: Running annotate_variation.pl on ./avready/Liver.avinput INFO: ANNOVAR annotation Process complete for ./avready/Liver.avinput INFO: Running coding_change.pl on ./avannotated/Liver.avannotated.exonic_variant_function INFO: Coding predictions complete for ./avannotated/Liver.avannotated.exonic_variant_function Traceback (most recent call last): File "/software/NeoPredPipe/NeoPredPipe.py", line 524, in main() File "/software/NeoPredPipe/NeoPredPipe.py", line 505, in main t.append(Sample(localpath, patname, patFile, hlas[patname], annPaths, netMHCpanPaths, pepmatchPaths, Options)) File "/software/NeoPredPipe/NeoPredPipe.py", line 106, in init self.callNeoantigens(FilePath, netmhcpan, Options) File "/software/NeoPredPipe/NeoPredPipe.py", line 151, in callNeoantigens self.peptideFastas = MakeTempFastas(self.fastaChangeFormat, Options.epitopes) File "/software/NeoPredPipe/vcf_manipulate.py", line 171, in MakeTempFastas pos = int(seq_record.id.replace(";;",";").split(";")[6].split('-')[0])-1 IndexError: list index out of range ~

DrBlanco-Heredia commented 4 years ago

I have exactly the same problem but running it with hg38 (3 out of 21 samples gave me this error). Please help!

INFO: Annovar reference files of build hg38 were given, using this build for all analysis. INFO: Begin. INFO: Running convert2annovar.py on VCFs/302_2015_somatic_filtered_PASS.vcf INFO: VCF Conversion Process complete VCFs/302_2015_somatic_filtered_PASS.vcf INFO: Running annotate_variation.pl on 302_2015/avready/302_2015_somatic_filtered_PASS.avinput INFO: ANNOVAR annotation Process complete for 302_2015/avready/302_2015_somatic_filtered_PASS.avinput INFO: Running coding_change.pl on 302_2015/avannotated/302_2015_somatic_filtered_PASS.avannotated.exonic_variant_function INFO: Coding predictions complete for 302_2015/avannotated/302_2015_somatic_filtered_PASS.avannotated.exonic_variant_function Traceback (most recent call last): File "/apps/NEOPREDPIPE/1.1/NeoPredPipe.py", line 523, in main() File "/apps/NEOPREDPIPE/1.1/NeoPredPipe.py", line 504, in main t.append(Sample(localpath, patname, patFile, hlas[patname], annPaths, netMHCpanPaths, pepmatchPaths, Options)) File "/apps/NEOPREDPIPE/1.1/NeoPredPipe.py", line 105, in init self.callNeoantigens(FilePath, netmhcpan, Options) File "/apps/NEOPREDPIPE/1.1/NeoPredPipe.py", line 150, in callNeoantigens self.peptideFastas = MakeTempFastas(self.fastaChangeFormat, Options.epitopes) File "/.statelite/tmpfs/gpfs/apps/MN3/NEOPREDPIPE/1.1/vcf_manipulate.py", line 147, in MakeTempFastas pos = int(seq_record.id.replace(";;",";").split(";")[6].split('-')[0])-1 IndexError: list index out of range ~

elakatos commented 4 years ago

Dear All, Sorry for the delayed reply. I have not encountered this issue before, but seems like there must be variant(s) that are processed differently by annovar than the example we have used for testing. Can either of you share a fasta file or line (produced by the coding change prediction step) that prompts this error? Alternatively, with hg19 reference, can you run "python neopredpipe_tests.py" to verify whether the code runs appropriately on a small example from us - if this error comes up there too, it would suggest maybe an issue in annovar or its version. Thanks!

DrBlanco-Heredia commented 4 years ago

Hello thanks for answering!

I'm attaching the .fasta and the .reformat.fasta files that give me the error. (I had to add the .txt extension so GitHub let me upload it)

Best 302_018_somatic_filtered_PASS.fasta.txt 302_018_somatic_filtered_PASS.reformat.fasta.txt

elakatos commented 4 years ago

Thanks for the file. I managed to find the issue: there was one mutation that produced "startloss" according to Annovar, and it had no meaningful "position" information to read in for antigen predicting.

I have now added this case to be handled in vcf_manipulate.py, so if you pull the most recent version from here, the error should be resolved. Can you please let me know if it works? I've also added some diagnostic printing to the same function, so if there is still an issue, it should print out the problematic line.

DrBlanco-Heredia commented 4 years ago

It's working very nicely now, thank you very much!

renyongzhe commented 4 years ago

Hello thanks for answering!

I'm attaching the .fasta and the .reformat.fasta files that give me the error. (I had to add the .txt extension so GitHub let me upload it)

Best 302_018_somatic_filtered_PASS.fasta.txt 302_018_somatic_filtered_PASS.reformat.fasta.txt

how did these fasta file yield? which script?

XiangweiZhai commented 2 years ago

Hi, I processed test file without error message, but when I processed my files I had the identical problem as DrBlanco-Heredia: INFO: Annovar reference files of build hg19 were given, using this build for all analysis. INFO: Begin. INFO: Running convert2annovar.py on /data/xiangwei/multiregion_neos/ullah/neos/vcf_files/combine_n12/n12.vcf INFO: VCF Conversion Process complete /data/xiangwei/multiregion_neos/ullah/neos/vcf_files/combine_n12/n12.vcf INFO: Running annotate_variation.pl on /data/xiangwei/multiregion_neos/ullah/neos/n12/avready/n12.avinput INFO: ANNOVAR annotation Process complete for /data/xiangwei/multiregion_neos/ullah/neos/n12/avready/n12.avinput INFO: Running coding_change.pl on /data/xiangwei/multiregion_neos/ullah/neos/n12/avannotated/n12.avannotated.exonic_variant_function INFO: Coding predictions complete for /data/xiangwei/multiregion_neos/ullah/neos/n12/avannotated/n12.avannotated.exonic_variant_function Traceback (most recent call last): File "/opt/NeoPredPipe-1.1/NeoPredPipe.py", line 523, in main() File "/opt/NeoPredPipe-1.1/NeoPredPipe.py", line 504, in main t.append(Sample(localpath, patname, patFile, hlas[patname], annPaths, netMHCpanPaths, pepmatchPaths, Options)) File "/opt/NeoPredPipe-1.1/NeoPredPipe.py", line 105, in init self.callNeoantigens(FilePath, netmhcpan, Options) File "/opt/NeoPredPipe-1.1/NeoPredPipe.py", line 150, in callNeoantigens self.peptideFastas = MakeTempFastas(self.fastaChangeFormat, Options.epitopes) File "/opt/NeoPredPipe-1.1/vcf_manipulate.py", line 147, in MakeTempFastas pos = int(seq_record.id.replace(";;",";").split(";")[6].split('-')[0])-1 IndexError: list index out of range There are the fasta files produced by the coding change prediction step: I modify the size( select the top 100000 rows)and format( .txt suffix)of the attachment top_n12.fasta.txt top_n12.reformat.fasta.txt I also find startloss mutations in my files: cat top_n12.fasta | grep "startloss"

line95564 NM_001009931 c.1delA p.M1? startloss line98342 NM_001350784 c.A1G p.M1? startloss line215619 NM_015700 c.T2A p.M1? startloss line216703 NM_001692 c.T2C p.M1? startloss line289519 NM_198047 c.T2C p.M1? startloss line521044 NM_001300785 c.G3A p.M1? startloss

softwares: NeoPredPipe-1.1(latest version) netMHCpan-4.0 How do I fix it?