SofiaOtero commented 2 years ago

Hi, I have been trying to run antigen garnish for a while with your testdata and now it seems to run fine with parallel and netMHC. The issue is that in the folder as e.g. ag_f236b988a09e438ea2 it does not seem like all netMHC's have been run as I only get following amount of files: netMHC_2222f95f-7e2c-4c43_o.csv netMHCpan_5bf2b41a-85f4-453e_o.csv netMHC_29f106ef-a12f-488c_o.csv netMHCpan_71ec0664-1374-4786_o.csv netMHC_60c9a80d-91cf-4c85_o.csv netMHCpan_756ff183-22ac-453e_o.csv netMHC_88957201-b040-4b1c_o.csv netMHCpan_881141cd-e83d-49ab_o.csv netMHC_992253ab-0c44-4986_o.csv netMHCpan_a1a0dfef-606a-4942_o.csv netMHC_b61f7b8f-a6b5-45cc_o.csv netMHCpan_b3d9c7ad-7535-45f9_o.csv netMHC_c6123f47-736a-44f1_o.csv netMHCpan_bff23973-51cc-41e7_o.csv netMHCIIpan_eb376fad-1533-4079_o.csv netMHCpan_cd147a8a-c683-4d5b_o.csv netMHCpan_2ec1102a-b2ce-4c07_o.csv netMHCpan_e0771477-971b-4a0b_o.csv netMHCpan_35980300-2a74-49f9_o.csv netMHCpan_f962f91a-ce37-4784_o.csv netMHCpan_56cb499b-ace7-453e_o.csv netMHCpan_fd56749a-9714-4d38_o.csv

In the netMHC files there are not results from all the lengths and neither all the HLA's, so it seems like not all files needed for antigen garnish have been created as I get following error after 'Running netMHC in parallel.':

Collating netMHC output... Read 74 items Read 79 items Read 84 items Read 89 items Read 94 items Read 99 items Read 103 items Read 83 items Error in data.table::setnames(., dt %>% names(), dtn) : 'old' is length 14 but 'new' is length 1 Calls: %>% ... collate_netMHC -> lapply -> FUN -> %>% -> In addition: Warning message: In .Call2("DNAStringSet_translate", x, skip_code, dna_codes[codon_alphabet], : last 2 bases were ignored Removing temporary files. Execution halted

Can you help me to resolve this issue? Thanks in advance.

andrewrech commented 2 years ago

Hi Sofia,

I am sorry for the delayed response. Thanks for trying the software. I'm happy to help you sort this out. NetMHC tool configuration is indeed often an issue - and that could be the issue here - but based on this error message:

Error in data.table::setnames(., dt %>% names(), dtn)

I cannot be certain what is occurring. I agree that the mostly likely explanation is that the netMHC output is not as expected.

In my opinion, the best path forward would be for you to try our Docker image. If it works in Docker, we know it is a configuration issue. If it does not, it could be a bug in our R package that we need to address.

if you cannot use Docker, can you pare down the test data to a minimal set of predictions that works?

SofiaOtero commented 2 years ago

Hi Andrew,

Thank you very much for your response. I just tried to run the test data on peptide length 9 and only with the HLA-A*02:01 allele. Now I get an output file from hence netMHC and netMHCpan that look correct (netMHC_1245669d-3789-4c5d_o.csv and netMHCpan_fa954704-9be2-4f0d_o.csv). Though I still get the same error: Collating netMHC output... Read 79 items Read 88 items Error in data.table::setnames(., dt %>% names(), dtn) : 'old' is length 14 but 'new' is length 1 Calls: %>% ... collate_netMHC -> lapply -> FUN -> %>% -> In addition: Warning message: In .Call2("DNAStringSet_translate", x, skip_code, dna_codes[codon_alphabet], : last 2 bases were ignored Removing temporary files. Execution halted

I have never tried Docker before, but do you think it will be the best solution then?

Kind regards Sofia

andrewrech commented 2 years ago

Hi Sofia,

Are you using these versions of NetMHC listed in the README?

NetMHC 4.0
NetMHCpan 4.1b
NetMHCII 2.3
NetMHCIIpan 4.0

SofiaOtero commented 2 years ago

Hi Andrew,

Yes I have downloaded these versions:

NetMHCpan 4.1b: https://services.healthtech.dtu.dk/cgi-bin/sw_request NetMHCIIpan 4.0: https://services.healthtech.dtu.dk/cgi-bin/sw_request NetMHCII 2.3: https://services.healthtech.dtu.dk/cgi-bin/sw_request NetMHC 4.0: https://services.healthtech.dtu.dk/cgi-bin/sw_request

Kind regards Sofia

andrewrech commented 2 years ago

This error is occurring here because the results from netMHC are not in the expected format.

Could you take a look at the results tables netMHC_[id]_o.csv and netMHCpan_[id]_o.csv? These should be a table with a the netMHC header and peptide binding results in columns.

Could you also test that you are able to run the netMHC tools from the command line and that they work normally?

SofiaOtero commented 2 years ago

The output from netMHC looks normal:

/home/projects/SRHgroup/apps/antigen.garnish/netMHC/netMHC-4.0/Linux_x86_64/bin/netMHC -p -l 9 -a HLA-A0201 -f netMHC_1245669d-3789-4c5d.csv

Thu May 26 10:37:54 2022

User: sofote

PWD : /home/projects/SRHgroup/projects/MuPeXI_project/scripts/antigen_garnish/ag_848f53c57a5744e6a7

Host: Linux g-01-c0024 3.10.0-1062.4.1.el7.x86_64 x86_64

-p 1 Switch on if input is a list of peptides (Peptide format)

-l 9 Peptide length (multiple lengths separated by comma e.g. 8,9,10)

-a HLA-A0201 HLA allele name

-f netMHC_1245669d-3789-4c5d.csv Input file (by default in FASTA format)

Command line parameters set to:

[-a line] HLA-A0201 HLA allele name

[-f filename] netMHC_1245669d-3789-4c5d.csv Input file (by default in FASTA format)

[-p] 1 Switch on if input is a list of peptides (Peptide format)

[-l string] 9 Peptide length (multiple lengths separated by comma e.g. 8,9,10)

[-s] 0 Sort output on decreasing affinity

[-rth float] 0.500000 Threshold for high binding peptides (%Rank)

[-rlt float] 2.000000 Threshold for low binding peptides (%Rank)

[-listMHC] 0 Print list of alleles included in netMHC

[-xls] 0 Save output to xls file

[-xlsfile filename] NetMHC_out.xls File name for xls output

[-t float] -99.900002 Threshold for output

[-thrfmt filename] /home/projects/SRHgroup/apps/antigen.garnish/netMHC/netMHC-4.0/Linux_x86_64/data/threshold/%s.thr Format for threshold filenames

[-hlalist filename] /home/projects/SRHgroup/apps/antigen.garnish/netMHC/netMHC-4.0/Linux_x86_64/data/allelelist File with covered HLA names

[-rdir filename] /home/projects/SRHgroup/apps/antigen.garnish/netMHC/netMHC-4.0/Linux_x86_64 Home directory for NetMHC

[-tdir filename] /scratch/35872875 Temporary directory (Default $$)

[-syn filename] /home/projects/SRHgroup/apps/antigen.garnish/netMHC/netMHC-4.0/Linux_x86_64/data/synlists/%s.synlist Format of synlist file

[-v] 0 Verbose mode

[-dirty] 0 Dirty mode, leave tmp dir+files

[-inptype int] 0 Input type [0] FASTA [1] Peptide

[-version filename] /home/projects/SRHgroup/apps/antigen.garnish/netMHC/netMHC-4.0/Linux_x86_64/data/version File with version information

[-w] 0 w option for webface

NetMHC version 4.0

Input is in PEPTIDE format

Rank Threshold for Strong binding peptides 0.500

Rank Threshold for Weak binding peptides 2.000

pos HLA peptide Core Offset I_pos I_len D_pos D_len iCore Identity 1-log50k(aff) Affinity(nM) %Rank BindLevel

0    HLA-A0201    AAAAAAAGT    AAAAAAAGT      0      0      0      0      0    AAAAAAAGT         PEPLIST         0.068     24075.65    34.00
0    HLA-A0201    PEKRITANL    PEKRITANL      0      0      0      0      0    PEKRITANL         PEPLIST         0.024     38592.25    80.00
0    HLA-A0201    WMTCCLLGL    WMTCCLLGL      0      0      0      0      0    WMTCCLLGL         PEPLIST         0.706        24.01     0.40 <= SB
0    HLA-A0201    CLLACDRDL    CLLACDRDL      0      0      0      0      0    CLLACDRDL         PEPLIST         0.304      1854.38     5.50
0    HLA-A0201    AAAAAAAAG    AAAAAAAAG      0      0      0      0      0    AAAAAAAAG         PEPLIST         0.039     32861.51    55.00
0    HLA-A0201    LLACDCDLC    LLACDCDLC      0      0      0      0      0    LLACDCDLC         PEPLIST         0.247      3456.07     8.00
0    HLA-A0201    QHAAAAAAA    QHAAAAAAA      0      0      0      0      0    QHAAAAAAA         PEPLIST         0.035     34165.29    60.00
0    HLA-A0201    FCLLACDCD    FCLLACDCD      0      0      0      0      0    FCLLACDCD         PEPLIST         0.067     24326.76    34.00
0    HLA-A0201    TCCLLGLAP    TCCLLGLAP      0      0      0      0      0    TCCLLGLAP         PEPLIST         0.068     24036.36    34.00
0    HLA-A0201    MTCCLLGLA    MTCCLLGLA      0      0      0      0      0    MTCCLLGLA         PEPLIST         0.261      2979.35     7.00

or is it because antigen garnish is not expecting the header that my files get?

I can see that they all can be run except NetMHCII 2.3 so it makes sense that I didn't get these output files before, but this should not have an effect on the prediction of the minimal set of predictions for MHC I that I am using now should it?

Kind regards Sofia

SofiaOtero commented 2 years ago

Sorry the beginning of the file got printed very big..

andrewrech commented 2 years ago

should not have an effect

yes

I am not sure. Could you upload the output files netMHC_[id]_o.csv and netMHCpan_[id]_o.csv here?

SofiaOtero commented 2 years ago

netMHC_1245669d-3789-4c5d_o.csv netMHCpan_fa954704-9be2-4f0d_o.csv

andrewrech commented 2 years ago

This error is occurring because the first two lines of your netMHCpan output file

/scratch/35872875
/home/projects/SRHgroup/apps/antigen.garnish/netMHC/netMHCpan-4.1/Linux_x86_64

Screen Shot 2022-05-26 at 09 41 13

are not "commented out" like the rest of the header with the # character, hence they are being interpreted by R as being part of the results table, breaking downstream steps.

I am not sure why these lines exist. This output file seems to be prepended with these temporary and working directory paths for some reason? I've never seen this.

Unfortunately, parsing these minimally formatted stdout plain text files is subject to strange OS/CLI environment formatting issues such as this. You could attempt to fix this by determining what is causing this. Alternatively, I suggest you use our Docker container where we are able to have control over factors such as this and where I am certain all our package tests, including those covering these input files, pass.

Repro below

Screen Shot 2022-05-26 at 09 47 43

Screen Shot 2022-05-26 at 09 48 06

And then ultimately "/scratch/35872875" is being taken as the header, incorrectly.

SofiaOtero commented 2 years ago

Thank you very very much for your help, I think we will first try to fix it with the netMHC output files and then try the Docker version if it does not work.

Kind regards Sofia

andrewrech commented 2 years ago

If I had to bet, I would say launching R with --vanilla from a sh or bash shell with only default configuration might do the trick.

SofiaOtero commented 2 years ago

Hi Andrew,

Thanks for your help, I managed to remove the first lines from the file as they where echo'ed into the output file which I hadn't noticed. I get an output file from the test data with all peptide lengths and all HLA's but I am not sure that the output from antigen garnish is correct and if netMHCII even is run. Can you confirm if the output looks correct? I just included the head of the file as it is too big. head_ag_output_30_05_22_13_12_04.txt

Kind regards Sofia

andrewrech commented 2 years ago

Looks good. You can see the affinity predictions in the column affinity(nM)_netMHC, for instance 35493.51, and the command tht was run in the column command_netMHC, for instance netMHC -p -l 8 -a HLA-A0201 -f netMHC_135e8ad1-3be6-45e3.csv

SofiaOtero commented 2 years ago

Thank you for all your help, then I will proceed with my own data.

Kind regards Sofia :)

SofiaOtero commented 2 years ago

Hi again, sorry for all the questions, but I have a few more.

I am trying to read both the test output file and my own output files into R where I comma separate the header and rows. The problem I get here is that there are more columns than column names. If I count how many commas there are in the header I get 123 and in the rows there are 148. Do you have a better way to load in the output files?
I have some trouble when trying to add MHC II alleles to the prediction, if I use e.g. HLA-DRB111:01 HLA-DRB115:01 then program crashes and I get no netmhcii output even though I can see they are both available in netmhcii (DRB1_1101,DRB1_1501) and netmhciipan (DRB1_1101,DRB1_1501). I have written the alleles ("HLA-DRB111:01 HLA-DRB115:01") exactly as here which is the same way as you did in your test script: HLA-DRB1*14:67.
Additionally there are some DQ alleles I would like to run that are only available in netmhciipan and not in netmhcii e.g. HLA-DQA10505-DQB10602. Is there a way I can do this where the program does skip the netmhcii prediction or should I filter them out before prediction?

Thank you for your time.

Kind regards Sofia

andrewrech commented 2 years ago

Hi Sofia,

I believe you have intra-field commas in your VCF file which is breaking this table. Normally when this is the case the fields would be quoted in the output to prevent intra-field separators from breaking the table structure but this is not the case in your output. I am not sure why. What version of data.table are you using?
Do these alleles work from the command line?
If an allele is only available for netmhciipan then only that tool will run, so you should not need to change anything.

SofiaOtero commented 2 years ago

Hi Andrew, thanks for the quick answer.

I went back to the test example and separated the file with tabs instead of comma and now I get the correct number of columns. When I look through the variables now I can see that some of them as e.g. TUMOR and NORMAL contain commas which is probably why I had the previous problem.
If I change the alleles in the test example to e.g. dt[, MHC := c("HLA-DRB1*11:01")] I get following error: /bin/sh: netMHCII: command not found Collating netMHC output... Read 0 items Error in data.table::setnames(., dt %>% names(), dtn) : NA in 'new' at positions [1] Calls: %>% ... collate_netMHC -> lapply -> FUN -> %>% -> In addition: Warning message: In .Call2("DNAStringSet_translate", x, skip_code, dna_codes[codon_alphabet], : last 2 bases were ignored Removing temporary files. Execution halted

Are you able to run your test example with this HLA, to see if I still have a problem with netmhcii or netmhciipan on our server. Though this seems weird as the test example with your suggested alleles (dt[, MHC := c("HLA-A01:47 HLA-A02:01 HLA-DRB1*14:67")]) works fine.

Good to know that it is possible to run with all HLAs.

Kind regards Sofia

andrewrech commented 2 years ago

Could you check the appropriate notation using the netMHCII command line tool? I think it has a function to list all alleles and the correct format. Maybe the required nomenclature is different than expected? I can check it out further tomorrow if that doesn't solve the issue.

SofiaOtero commented 2 years ago

Hi Andrew,

Both netmhcii and netmhciipan takes the MHC II alleles in the same way, for DRB it is DRB1_1101 and for DQ it is HLA-DQA10102-DQB10501. So I tried to input it to antigen garnish in different ways on your test data:

dt[, MHC := c("HLA-DRB1_1101")] , getting following error: Generating prediction commands. Error in paste(type, "-p", "-l", nmer_l, "-a", allele, "-f", filename) : object 'type' not found Calls: %>% ... get_pred_commands -> [ -> [.data.table -> eval -> eval -> paste In addition: Warning message: In .Call2("DNAStringSet_translate", x, skip_code, dna_codes[codon_alphabet], : last 2 bases were ignored Removing temporary files. Execution halted
dt[, MHC := c("DRB1_1101")] , getting following error: Filtering WT peptide matches. Error in garnish_affinity(.) : MHC do not contain "HLA-" or "H-2" as a pattern. Alleles must be correctly formatted, see list_mhc(). Calls: %>% -> garnish_affinity In addition: Warning message: In .Call2("DNAStringSet_translate", x, skip_code, dna_codes[codon_alphabet], : last 2 bases were ignored Removing temporary files. Execution halted
I also tried the original one (dt[, MHC := c("HLA-DRB1*11:01")]) in the message I sent above and still get the same error.

If you have time to try it off I would very much appreciate it!

I have managed to run my data on MHC I so it is only the MHC II that is missing now.

Kind regards Sofia

andrewrech commented 2 years ago

Hi Sofia,

Sorry for the delay. This was an issue in our codebase in which the unique format of this allele name broke our creation of the netMHC commands. I think you are the first person to test this allele. Sorry for the trouble. It is fixed on master in Github in the commit linked below.

Please use HLA-DRB1_1101.

# load an example VCF
dir <- system.file(package = "antigen.garnish") %>%
       file.path(., "extdata/testdata")

file <- file.path(dir, "TUMOR.vcf")

# extract variants
dt <-  garnish_variants(file)

# add space separated MHC types
# see list_mhc() for nomenclature of supported alleles
# MHC may also be set to "all_human" or "all_mouse" to use all supported alleles

dt[, MHC := c("HLA-DQA10102-DQB10501 HLA-DRB1_1101")]

# predict neoantigens
result <- dt %>% garnish_affinity(.)

Generating metadata.
Reading local transcript metadata.
Checking netMHC scripts in antigen.garnish data directory.
Extracting cDNA.
Make cDNA.
Generating mutant peptide index.
Generating variants
Generating nmers
Filtering WT peptide matches.
Checking netMHC scripts in antigen.garnish data directory.
Running blastp-short to find close matches for differential agretopicity calculation.
blastp -query Hu_ag_nmer_fasta.fa -task blastp-short -db /root/antigen.garnish/human.bdb -out Hu4854db3f77214542_blastpout.csv -num_threads 16 -outfmt '10 qseqid sseqid qseq qstart qend sseq sstart send length mismatch pident evalue bitscore'
Calculating local alignment to WT peptides for proteome-wide differential agretopicity predictions.
[1] "Alignment subset 1 of 2"
[1] "Alignment subset 2 of 2"
Removing temporary fasta files.
Generating prediction commands.
Checking netMHC scripts in antigen.garnish data directory.
Running netMHC in parallel.
Collating netMHC output...
Read 78 items
Running mhcflurry in parallel.
Merging output.
Reading mhcflurry output.
Calculating netMHC consensus score.
Calculating overall consensus affinity score.
No ensemble prediction scores.
BLAST did not run.
Removing temporary files.

result$`%Rank_EL_netMHCIIpan` %>%
  stats::na.omit() %>%
  as.numeric()

  [1] 71.80 13.58 78.25 19.34 73.63 29.33 84.99 52.69 93.42 80.47 84.82 76.82
 [13] 51.54 61.11 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00
 [25] 95.00 95.00 95.00 95.00 95.00 95.00 93.27 93.27 93.27 93.27 92.88 92.88
 [37] 92.88 92.88 95.00 95.00 95.00 95.00 95.00 95.00 90.87 90.87 95.00 95.00
 [49] 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00
 [61] 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 89.05 89.05
 [73] 89.05 89.05 88.08 88.08 88.08 88.08 95.00 95.00 94.74 94.74 63.72 63.72
 [85] 63.72 63.72 63.72 48.82 48.82 48.82 48.82 48.82 95.00 95.00 92.71 92.71
 [97] 90.46 90.46 90.46 90.46 90.46 90.46 90.46 90.46 90.46 90.54 90.54 90.54
[109] 90.54 90.54 90.54 90.54 90.54 90.54 95.00 95.00 95.00 95.00 88.45 88.45
[121] 88.45 88.45 92.97 92.97 92.97 92.97 70.80 70.80 70.80 70.80 70.80 70.80
[133] 59.49 59.49 59.49 59.49 59.49 59.49 95.00 95.00 67.99 11.11 51.61 60.64
[145] 13.52  3.31 12.99  3.07 95.00 95.00 95.00 95.00 86.75 86.75 86.75 86.75
[157] 92.02 92.02 92.02 92.02 17.46  2.68 18.19  3.43 95.00 95.00 95.00 95.00
[169] 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 86.91 86.91 86.91 86.91
[181] 86.91 86.91 86.91 86.91 86.91 86.91 86.91 86.91 95.00 95.00 95.00 95.00
[193] 95.00 95.00 95.00 95.00 95.00 95.00 91.25 91.25 91.25 91.25 91.25 91.25
[205] 91.25 91.25 91.25 91.25 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00
[217] 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 77.83 77.83 77.83 77.83
[229] 92.01 92.01 92.01 92.01 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00
[241] 95.00 95.00 95.00 95.00 53.65  6.72 73.78 39.97 95.00 95.00 86.72 86.72
[253] 82.91 82.91 70.53 70.53 95.00 95.00 91.85 91.85 91.25 91.25 79.92 79.92
[265] 79.16 79.16 79.16 79.16 70.20 70.20 70.20 70.20 83.28 83.28 83.28 83.28
[277] 83.28 83.28 83.28 83.28 84.82 84.82 84.82 84.82 84.82 84.82 84.82 84.82
[289] 92.06 92.06 92.06 95.00 95.00 95.00 78.52 78.52 78.52 78.52 78.52 78.52
[301] 78.52 74.41 74.41 74.41 74.41 74.41 74.41 74.41 95.00 95.00 95.00 95.00
[313] 64.69  9.79 84.54 74.01 58.73  6.74 86.73 65.18 95.00 95.00 89.55 89.55
[325] 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 87.48
[337] 87.48 87.48 87.48 87.48 87.48 87.48 87.48 87.48 87.48 87.48 90.46 90.46
[349] 90.46 90.46 90.46 90.46 90.46 90.46 90.46 90.46 90.46 90.46 90.46 82.82
[361] 82.82 82.82 82.82 82.82 82.82 82.82 82.82 82.82 82.82 82.82 82.82 82.82
[373] 95.00 95.00 88.44 88.44 82.60 82.60 82.60 82.60 74.06 74.06 74.06 74.06
[385] 30.83  5.29 33.06 10.00 48.96  5.43 54.68 18.01

https://github.com/andrewrech/antigen.garnish/commit/fb9458b462c2f6d164b301eb14503f6979f8910e

SofiaOtero commented 2 years ago

Hi Andrew, thank you very much, I was finally able to run it again after waiting for them to update it on the server. I cannot run your command: dt[, MHC := c("HLA-DQA10102-DQB10501 HLA-DRB1_1101")], because I get the error: Calculating netMHC consensus score. Calculating overall consensus affinity score. Error in get(cols) : invalid first argument Calls: %>% ... merge_predictions -> [ -> [.data.table -> eval -> eval -> get In addition: Warning message: In .Call2("DNAStringSet_translate", x, skip_code, dna_codes[codon_alphabet], : last 2 bases were ignored Removing temporary files. Execution halted

Though if I run it with an MHC I allele first like: dt[, MHC := c("HLA-A*02:01 HLA-DQA10102-DQB10501 HLA-DRB1_1101")], then it works fine. I want to run them separately as I have huge files, do you know why it crashes doing that?

Kind regards Sofia

andrewrech commented 2 years ago

Sorry - just to be clear you tried the latest commit on Github?

On Jun 16, 2022, at 04:51, SofiaOtero @.***> wrote:

Hi Andrew, thank you very much, I was finally able to run it again after waiting for them to update it on the server. I cannot run your command: dt[, MHC := c("HLA-DQA10102-DQB10501 HLA-DRB1_1101")], because I get the error: Calculating netMHC consensus score. Calculating overall consensus affinity score. Error in get(cols) : invalid first argument Calls: %>% ... merge_predictions -> [ -> [.data.table -> eval -> eval -> get In addition: Warning message: In .Call2("DNAStringSet_translate", x, skip_code, dna_codes[codon_alphabet], : last 2 bases were ignored Removing temporary files. Execution halted

Though if I run it with an MHC I allele first like: dt[, MHC := c("HLA-A*02:01 HLA-DQA10102-DQB10501 HLA-DRB1_1101")], then it works fine. I want to run them separately as I have huge files, do you know why it crashes doing that?

Kind regards Sofia

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

SofiaOtero commented 2 years ago

Sorry I didn't see your response.

Yes we tried the latest commit and it does work in comparison to before, but it crashes if I don't add an HLA-I in the beginning. I have 26 patient files with different HLA's and I can see that it crashes for most of them with the same error as before "netMHCII: command not found", do you think it is because no one has run with those HLA II types before too?

Kind regards Sofia

SofiaOtero commented 2 years ago

Hi again,

I have a cohort of 26 patients with different HLA II alleles and many of them crash when I run antigen garnish, I know that they are all available in netMHCiipan. I don't know if it is too much to ask, but could you check if some of them crash when you run it? I have attached a txt file with them all in the correct format to run in antigen garnish.

unique_HLA_II.txt

Kind regards Sofia

andrewrech commented 2 years ago

I’m happy to check this, please give me a few days.

On Jun 23, 2022, at 09:20, SofiaOtero @.***> wrote:

Hi again,

I have a cohort of 26 patients with different HLA II alleles and many of them crash when I run antigen garnish, I know that they are all available in netMHCiipan. I don't know if it is too much to ask, but could you check if some of them crash when you run it? I have attached a txt file with them all in the correct format to run in antigen garnish.

unique_HLA_II.txt

Kind regards Sofia

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

SofiaOtero commented 2 years ago

Thanks, do you have any updates?

Kind regards Sofia

andrewrech commented 2 years ago

Sorry @SofiaOtero for the delay.

I fixed the error that was preventing correct parsing of some of the rare alleles for netMHCIIpan from the canonical format.

Every allele on your list now works, which you can test as I did:

  library(data.table)
  library(magrittr)
  library(antigen.garnish)
  library(parallel)

  HLA <- readLines("unique_HLA_II.txt") %>%
    stringr::str_split(" ") %>%
    unlist()

  dir <- system.file(package = "antigen.garnish") %>%
    file.path(., "extdata/testdata")
  file <- file.path(dir, "TUMOR.vcf")

  # test each allele
  ret <- HLA %>% lapply(function(i) {
    print(i)
    dt %<>% data.table::copy()
    dt[, MHC := i]
    ret <- try(garnish_affinity(dt))
    out <- list(
      name = i,
      result = ret
    )
    return(ret)
  })

  classes <- ret %>% lapply(function(i) {
    any(i %>%
      class() == "try-error")
  }) %>% unlist()

The table of HLA alleles is now printed on stdout also.

https://github.com/andrewrech/antigen.garnish/commit/2dd99ae7def9e80571c9dec6bcb0b8136dd8f62f https://github.com/andrewrech/antigen.garnish/commit/a3dd311db4b09972c66329e4c103bdae8d4480b2

SofiaOtero commented 2 years ago

Thank you very much, I have updated the new commit on the Github.

When I run e.g. following HLA's: HLA-DQA10102-DQB10301 HLA-DQA10102-DQB10602 HLA-DQA10505-DQB10301 HLA-DQA10505-DQB10602 HLA-DRB111:01 HLA-DRB115:01

I get this after all variants processed:

And I can see in the ag output directory that both netMHCII and netMHCIIpan have been run but then antigen garnish crashes with following error:

Checking netMHC scripts in antigen.garnish data directory. Running netMHC in parallel. /bin/sh: netMHCII: command not found /bin/sh: netMHCII: command not found <=> .. (comes several times) Collating netMHC output... Read 0 items Error in data.table::setnames(., dt %>% names(), dtn) : NA in 'new' at positions [1] Calls: %>% ... collate_netMHC -> lapply -> FUN -> %>% -> In addition: There were 22 warnings (use warnings() to see them) Removing temporary files. Execution halted

So it seems lige it cannot run when there are NA values in the netMHCII alleles even though it should just proceed and run netMHCIIpan. Is this also an error you get?

Kind regards Sofia

andrewrech commented 2 years ago

Hi Sofia,

No, I do not get an error.

Are you sure the paths are configured correctly?

/bin/sh: netMHCII: command not found
/bin/sh: netMHCII: command not found

Seems to indicate that they are not.

So it seems lige it cannot run when there are NA values in the netMHCII alleles even though it should just proceed and run netMHCIIpan.

By design, this should be fine and not generate any errors.

SofiaOtero commented 2 years ago

I have looked in your code on how the path should look and my path is: /home/projects/SRHgroup/apps/antigen.garnish/netMHC/netMHCII-2.3

The folder contains:

and I have tested the tool and it works fine when predicting netMHCII inside the folder.

The path to the bin folder is: /home/projects/SRHgroup/apps/antigen.garnish/netMHC/netMHCII-2.3/Linux_x86_64/bin

And the bin folder contains the netMHCII as it should:

Can you tell me if some of the paths are wrong or what the problem then could be?

Kind regards Sofia

andrewrech / antigen.garnish

Running netMHC #148

/home/projects/SRHgroup/apps/antigen.garnish/netMHC/netMHC-4.0/Linux_x86_64/bin/netMHC -p -l 9 -a HLA-A0201 -f netMHC_1245669d-3789-4c5d.csv

Thu May 26 10:37:54 2022

User: sofote

PWD : /home/projects/SRHgroup/projects/MuPeXI_project/scripts/antigen_garnish/ag_848f53c57a5744e6a7

Host: Linux g-01-c0024 3.10.0-1062.4.1.el7.x86_64 x86_64

-p 1 Switch on if input is a list of peptides (Peptide format)

-l 9 Peptide length (multiple lengths separated by comma e.g. 8,9,10)

-a HLA-A0201 HLA allele name

-f netMHC_1245669d-3789-4c5d.csv Input file (by default in FASTA format)

Command line parameters set to:

[-a line] HLA-A0201 HLA allele name

[-f filename] netMHC_1245669d-3789-4c5d.csv Input file (by default in FASTA format)

[-p] 1 Switch on if input is a list of peptides (Peptide format)

[-l string] 9 Peptide length (multiple lengths separated by comma e.g. 8,9,10)

[-s] 0 Sort output on decreasing affinity

[-rth float] 0.500000 Threshold for high binding peptides (%Rank)

[-rlt float] 2.000000 Threshold for low binding peptides (%Rank)

[-listMHC] 0 Print list of alleles included in netMHC

[-xls] 0 Save output to xls file

[-xlsfile filename] NetMHC_out.xls File name for xls output

[-t float] -99.900002 Threshold for output

[-thrfmt filename] /home/projects/SRHgroup/apps/antigen.garnish/netMHC/netMHC-4.0/Linux_x86_64/data/threshold/%s.thr Format for threshold filenames

[-hlalist filename] /home/projects/SRHgroup/apps/antigen.garnish/netMHC/netMHC-4.0/Linux_x86_64/data/allelelist File with covered HLA names

[-rdir filename] /home/projects/SRHgroup/apps/antigen.garnish/netMHC/netMHC-4.0/Linux_x86_64 Home directory for NetMHC

[-tdir filename] /scratch/35872875 Temporary directory (Default $$)

[-syn filename] /home/projects/SRHgroup/apps/antigen.garnish/netMHC/netMHC-4.0/Linux_x86_64/data/synlists/%s.synlist Format of synlist file

[-v] 0 Verbose mode

[-dirty] 0 Dirty mode, leave tmp dir+files

[-inptype int] 0 Input type [0] FASTA [1] Peptide

[-version filename] /home/projects/SRHgroup/apps/antigen.garnish/netMHC/netMHC-4.0/Linux_x86_64/data/version File with version information

[-w] 0 w option for webface

NetMHC version 4.0

Input is in PEPTIDE format

Rank Threshold for Strong binding peptides 0.500

Rank Threshold for Weak binding peptides 2.000

pos HLA peptide Core Offset I_pos I_len D_pos D_len iCore Identity 1-log50k(aff) Affinity(nM) %Rank BindLevel