MathOnco / NeoPredPipe

Neoantigens prediction pipeline for multi- or single-region vcf files using ANNOVAR and netMHCpan.
GNU Lesser General Public License v3.0
108 stars 29 forks source link

Neoantigen burden in mouse #25

Open Francesc-Muyas opened 3 years ago

Francesc-Muyas commented 3 years ago

Hi, Does you tool work with mm10 (mouse) data? As far as I know, NetMHCpan-4.0 permitts to work with it, but I did not find this possibility in your documentation. Thanks

elakatos commented 3 years ago

Hi, We don't currently support mouse data, but it should be straightforward to add the option.

I believe there are two parts of the pipeline where the difference matters: annovar annotation database and MHC allele input processing. I can add the option of mouse database into the accepted annovar databases - can you please let me know the name of the annovar database file you would use? For the MHC types, the supplementary file netHMCpanAlleles contains all the ones accepted by the standalone version of netMHCpan. In order to be processed correctly, you should make sure the alleles are in the same format as in this list, as we don't support processing of different syntaxes like in human data. I will make sure that if you supply a mouse allele as appears in the list, it gets processed correctly.

Francesc-Muyas commented 3 years ago

Hi, It would be really nice.

The annotation database that I am using was obtained like next: perl annotate_variation.pl -buildver mm10 -downdb -webfrom annovar refGene mm10db/ perl annotate_variation.pl --buildver mm10 --downdb seq mm10db/mm10_seq perl retrieve_seq_from_fasta.pl mm10db/mm10_refGene.txt -seqdir mm10db/mm10_seq -format refGene -outfile mm10db/mm10_refGeneMrna.fa perl annotate_variation.pl -buildver mm10 -downdb cytoBand mm10db/

Thanks,

elakatos commented 3 years ago

Hi, I've just pushed a couple of new commits that should allow you to use the mm10 database version, and mouse MHC alleles. I believe if you supply the correct paths in the usr_paths.ini file, it should work fine, but let me know if it throws an "Unexpected genome build detected" warning. As for the MHC alleles, I have only checked for alleles formatted as "H-2-Db" (from the allele list supported by netMHCpan 4.0), so would advise to format your alleles like this.

Best, Eszter

Francesc-Muyas commented 3 years ago

Thanks Eszter, Apparently, the script recognises the genome build :).

However, it seems I have an error when running one sample

Traceback (most recent call last):
  File "/hps/research1/icortes/fmuyas/programs/NeoPredPipe/NeoPredPipe.py", line 524, in <module>
    main()
  File "/hps/research1/icortes/fmuyas/programs/NeoPredPipe/NeoPredPipe.py", line 505, in main
    t.append(Sample(localpath, patname, patFile, hlas[patname], annPaths, netMHCpanPaths, pepmatchPaths, Options))
  File "/hps/research1/icortes/fmuyas/programs/NeoPredPipe/NeoPredPipe.py", line 101, in __init__
    self.hlasnormed = ConstructAlleles(self.hla, FilePath, self.patID)
  File "/hps/research1/icortes/fmuyas/programs/NeoPredPipe/hla_preprocess.py", line 187, in ConstructAlleles
    if len(hlasWithSuffix)>0:
UnboundLocalError: local variable 'hlasWithSuffix' referenced before assignment

I don't know if I created wrongly the usr_paths.ini file, or the h2 file is wrongly specified:

This is how my h2 file looks like: Patient H-2 DC1246T1 H-2-Db

Could you put an example of how the usr_paths.ini should look like for mouse, and the same for the h2 (hla) file?

Thanks, Fran

elakatos commented 3 years ago

Hi Fran,

The h2 file looks good at first glance, just one question: are the fields separated by tabulators? If so, it should work fine, and the error message doesn't look like it's related to usr_paths.ini. I'll have a look since based on the error message my guess would be I introduced some small bug in the code when trying to handle mouse data.

Eszter

wt12318 commented 3 years ago

Hi,

I have the same error when handle the mouse data:

NeoPredPipe.py -I ./samples/ -H hla/mouse_mhc.txt -o ./ -n test -E 8 9 10 -a -m
INFO: Annovar reference files of build mm10 were given, using this build for all analysis.
INFO: Begin.
INFO: Running convert2annovar.py on ./samples/SRR5133399.vcf
INFO: ANNOVAR VCF Conversion Process complete ./samples/SRR5133399.vcf
INFO: Running annotate_variation.pl on ./avready/SRR5133399.avinput
INFO: ANNOVAR annotation Process complete for ./avready/SRR5133399.avinput
INFO: Running coding_change.pl on ./avannotated/SRR5133399.avannotated.exonic_variant_function
Died at /public/slst/home/wutao2/software/annovar/coding_change.pl line 677.
INFO: Coding predictions complete for ./avannotated/SRR5133399.avannotated.exonic_variant_function
Traceback (most recent call last):
  File "/public/slst/home/wutao2/software/NeoPredPipe/NeoPredPipe.py", line 524, in <module>
    main()
  File "/public/slst/home/wutao2/software/NeoPredPipe/NeoPredPipe.py", line 505, in main
    t.append(Sample(localpath, patname, patFile, hlas[patname], annPaths, netMHCpanPaths, pepmatchPaths, Options))
  File "/public/slst/home/wutao2/software/NeoPredPipe/NeoPredPipe.py", line 101, in __init__
    self.hlasnormed = ConstructAlleles(self.hla, FilePath, self.patID)
  File "/slst/home/wutao2/software/NeoPredPipe/hla_preprocess.py", line 185, in ConstructAlleles
    if len(hlasWithSuffix)>0:
UnboundLocalError: local variable 'hlasWithSuffix' referenced before assignment

The log file: logforannovarNeoPredPipe.txt