Closed SofiaOtero closed 3 months ago
Hi Sofia,
I am sorry for the delayed response. Thanks for trying the software. I'm happy to help you sort this out. NetMHC tool configuration is indeed often an issue - and that could be the issue here - but based on this error message:
Error in data.table::setnames(., dt %>% names(), dtn)
I cannot be certain what is occurring. I agree that the mostly likely explanation is that the netMHC output is not as expected.
In my opinion, the best path forward would be for you to try our Docker image. If it works in Docker, we know it is a configuration issue. If it does not, it could be a bug in our R package that we need to address.
if you cannot use Docker, can you pare down the test data to a minimal set of predictions that works?
Hi Andrew,
Thank you very much for your response. I just tried to run the test data on peptide length 9 and only with the HLA-A*02:01 allele. Now I get an output file from hence netMHC and netMHCpan that look correct (netMHC_1245669d-3789-4c5d_o.csv and netMHCpan_fa954704-9be2-4f0d_o.csv). Though I still get the same error:
Collating netMHC output...
Read 79 items
Read 88 items
Error in data.table::setnames(., dt %>% names(), dtn) :
'old' is length 14 but 'new' is length 1
Calls: %>% ... collate_netMHC -> lapply -> FUN -> %>% ->
I have never tried Docker before, but do you think it will be the best solution then?
Kind regards Sofia
Hi Sofia,
Are you using these versions of NetMHC listed in the README?
Hi Andrew,
Yes I have downloaded these versions:
NetMHCpan 4.1b: https://services.healthtech.dtu.dk/cgi-bin/sw_request NetMHCIIpan 4.0: https://services.healthtech.dtu.dk/cgi-bin/sw_request NetMHCII 2.3: https://services.healthtech.dtu.dk/cgi-bin/sw_request NetMHC 4.0: https://services.healthtech.dtu.dk/cgi-bin/sw_request
Kind regards Sofia
This error is occurring here because the results from netMHC are not in the expected format.
Could you take a look at the results tables netMHC_[id]_o.csv
and netMHCpan_[id]_o.csv
? These should be a table with a the netMHC header and peptide binding results in columns.
Could you also test that you are able to run the netMHC tools from the command line and that they work normally?
The output from netMHC looks normal:
0 HLA-A0201 AAAAAAAGT AAAAAAAGT 0 0 0 0 0 AAAAAAAGT PEPLIST 0.068 24075.65 34.00
0 HLA-A0201 PEKRITANL PEKRITANL 0 0 0 0 0 PEKRITANL PEPLIST 0.024 38592.25 80.00
0 HLA-A0201 WMTCCLLGL WMTCCLLGL 0 0 0 0 0 WMTCCLLGL PEPLIST 0.706 24.01 0.40 <= SB
0 HLA-A0201 CLLACDRDL CLLACDRDL 0 0 0 0 0 CLLACDRDL PEPLIST 0.304 1854.38 5.50
0 HLA-A0201 AAAAAAAAG AAAAAAAAG 0 0 0 0 0 AAAAAAAAG PEPLIST 0.039 32861.51 55.00
0 HLA-A0201 LLACDCDLC LLACDCDLC 0 0 0 0 0 LLACDCDLC PEPLIST 0.247 3456.07 8.00
0 HLA-A0201 QHAAAAAAA QHAAAAAAA 0 0 0 0 0 QHAAAAAAA PEPLIST 0.035 34165.29 60.00
0 HLA-A0201 FCLLACDCD FCLLACDCD 0 0 0 0 0 FCLLACDCD PEPLIST 0.067 24326.76 34.00
0 HLA-A0201 TCCLLGLAP TCCLLGLAP 0 0 0 0 0 TCCLLGLAP PEPLIST 0.068 24036.36 34.00
0 HLA-A0201 MTCCLLGLA MTCCLLGLA 0 0 0 0 0 MTCCLLGLA PEPLIST 0.261 2979.35 7.00
or is it because antigen garnish is not expecting the header that my files get?
I can see that they all can be run except NetMHCII 2.3 so it makes sense that I didn't get these output files before, but this should not have an effect on the prediction of the minimal set of predictions for MHC I that I am using now should it?
Kind regards Sofia
Sorry the beginning of the file got printed very big..
should not have an effect
yes
I am not sure. Could you upload the output files netMHC_[id]_o.csv
and netMHCpan_[id]_o.csv
here?
This error is occurring because the first two lines of your netMHCpan output file
/scratch/35872875
/home/projects/SRHgroup/apps/antigen.garnish/netMHC/netMHCpan-4.1/Linux_x86_64
are not "commented out" like the rest of the header with the #
character, hence they are being interpreted by R as being part of the results table, breaking downstream steps.
I am not sure why these lines exist. This output file seems to be prepended with these temporary and working directory paths for some reason? I've never seen this.
Unfortunately, parsing these minimally formatted stdout plain text files is subject to strange OS/CLI environment formatting issues such as this. You could attempt to fix this by determining what is causing this. Alternatively, I suggest you use our Docker container where we are able to have control over factors such as this and where I am certain all our package tests, including those covering these input files, pass.
Repro below
And then ultimately "/scratch/35872875"
is being taken as the header, incorrectly.
Thank you very very much for your help, I think we will first try to fix it with the netMHC output files and then try the Docker version if it does not work.
Kind regards Sofia
If I had to bet, I would say launching R with --vanilla
from a sh
or bash
shell with only default configuration might do the trick.
Hi Andrew,
Thanks for your help, I managed to remove the first lines from the file as they where echo'ed into the output file which I hadn't noticed. I get an output file from the test data with all peptide lengths and all HLA's but I am not sure that the output from antigen garnish is correct and if netMHCII even is run. Can you confirm if the output looks correct? I just included the head of the file as it is too big. head_ag_output_30_05_22_13_12_04.txt
Kind regards Sofia
Looks good. You can see the affinity predictions in the column affinity(nM)_netMHC
, for instance 35493.51
, and the command tht was run in the column command_netMHC
, for instance netMHC -p -l 8 -a HLA-A0201 -f netMHC_135e8ad1-3be6-45e3.csv
Thank you for all your help, then I will proceed with my own data.
Kind regards Sofia :)
Hi again, sorry for all the questions, but I have a few more.
I am trying to read both the test output file and my own output files into R where I comma separate the header and rows. The problem I get here is that there are more columns than column names. If I count how many commas there are in the header I get 123 and in the rows there are 148. Do you have a better way to load in the output files?
I have some trouble when trying to add MHC II alleles to the prediction, if I use e.g. HLA-DRB111:01 HLA-DRB115:01 then program crashes and I get no netmhcii output even though I can see they are both available in netmhcii (DRB1_1101,DRB1_1501) and netmhciipan (DRB1_1101,DRB1_1501). I have written the alleles ("HLA-DRB111:01 HLA-DRB115:01") exactly as here which is the same way as you did in your test script: HLA-DRB1*14:67.
Additionally there are some DQ alleles I would like to run that are only available in netmhciipan and not in netmhcii e.g. HLA-DQA10505-DQB10602. Is there a way I can do this where the program does skip the netmhcii prediction or should I filter them out before prediction?
Thank you for your time.
Kind regards Sofia
Hi Sofia,
I believe you have intra-field commas in your VCF file which is breaking this table. Normally when this is the case the fields would be quoted in the output to prevent intra-field separators from breaking the table structure but this is not the case in your output. I am not sure why. What version of data.table are you using?
Do these alleles work from the command line?
If an allele is only available for netmhciipan then only that tool will run, so you should not need to change anything.
Hi Andrew, thanks for the quick answer.
I went back to the test example and separated the file with tabs instead of comma and now I get the correct number of columns. When I look through the variables now I can see that some of them as e.g. TUMOR and NORMAL contain commas which is probably why I had the previous problem.
If I change the alleles in the test example to e.g. dt[, MHC := c("HLA-DRB1*11:01")] I get following error:
/bin/sh: netMHCII: command not found
Collating netMHC output...
Read 0 items
Error in data.table::setnames(., dt %>% names(), dtn) :
NA in 'new' at positions [1]
Calls: %>% ... collate_netMHC -> lapply -> FUN -> %>% ->
Are you able to run your test example with this HLA, to see if I still have a problem with netmhcii or netmhciipan on our server. Though this seems weird as the test example with your suggested alleles (dt[, MHC := c("HLA-A01:47 HLA-A02:01 HLA-DRB1*14:67")]) works fine.
Kind regards Sofia
Could you check the appropriate notation using the netMHCII command line tool? I think it has a function to list all alleles and the correct format. Maybe the required nomenclature is different than expected? I can check it out further tomorrow if that doesn't solve the issue.
Hi Andrew,
Both netmhcii and netmhciipan takes the MHC II alleles in the same way, for DRB it is DRB1_1101 and for DQ it is HLA-DQA10102-DQB10501. So I tried to input it to antigen garnish in different ways on your test data:
dt[, MHC := c("HLA-DRB1_1101")] , getting following error: Generating prediction commands. Error in paste(type, "-p", "-l", nmer_l, "-a", allele, "-f", filename) : object 'type' not found Calls: %>% ... get_pred_commands -> [ -> [.data.table -> eval -> eval -> paste In addition: Warning message: In .Call2("DNAStringSet_translate", x, skip_code, dna_codes[codon_alphabet], : last 2 bases were ignored Removing temporary files. Execution halted
dt[, MHC := c("DRB1_1101")] , getting following error: Filtering WT peptide matches. Error in garnish_affinity(.) : MHC do not contain "HLA-" or "H-2" as a pattern. Alleles must be correctly formatted, see list_mhc(). Calls: %>% -> garnish_affinity In addition: Warning message: In .Call2("DNAStringSet_translate", x, skip_code, dna_codes[codon_alphabet], : last 2 bases were ignored Removing temporary files. Execution halted
I also tried the original one (dt[, MHC := c("HLA-DRB1*11:01")]) in the message I sent above and still get the same error.
If you have time to try it off I would very much appreciate it!
I have managed to run my data on MHC I so it is only the MHC II that is missing now.
Kind regards Sofia
Hi Sofia,
Sorry for the delay. This was an issue in our codebase in which the unique format of this allele name broke our creation of the netMHC commands. I think you are the first person to test this allele. Sorry for the trouble. It is fixed on master in Github in the commit linked below.
Please use HLA-DRB1_1101
.
# load an example VCF
dir <- system.file(package = "antigen.garnish") %>%
file.path(., "extdata/testdata")
file <- file.path(dir, "TUMOR.vcf")
# extract variants
dt <- garnish_variants(file)
# add space separated MHC types
# see list_mhc() for nomenclature of supported alleles
# MHC may also be set to "all_human" or "all_mouse" to use all supported alleles
dt[, MHC := c("HLA-DQA10102-DQB10501 HLA-DRB1_1101")]
# predict neoantigens
result <- dt %>% garnish_affinity(.)
Generating metadata.
Reading local transcript metadata.
Checking netMHC scripts in antigen.garnish data directory.
Extracting cDNA.
Make cDNA.
Generating mutant peptide index.
Generating variants
Generating nmers
Filtering WT peptide matches.
Checking netMHC scripts in antigen.garnish data directory.
Running blastp-short to find close matches for differential agretopicity calculation.
blastp -query Hu_ag_nmer_fasta.fa -task blastp-short -db /root/antigen.garnish/human.bdb -out Hu4854db3f77214542_blastpout.csv -num_threads 16 -outfmt '10 qseqid sseqid qseq qstart qend sseq sstart send length mismatch pident evalue bitscore'
Calculating local alignment to WT peptides for proteome-wide differential agretopicity predictions.
[1] "Alignment subset 1 of 2"
[1] "Alignment subset 2 of 2"
Removing temporary fasta files.
Generating prediction commands.
Checking netMHC scripts in antigen.garnish data directory.
Running netMHC in parallel.
Collating netMHC output...
Read 78 items
Running mhcflurry in parallel.
Merging output.
Reading mhcflurry output.
Calculating netMHC consensus score.
Calculating overall consensus affinity score.
No ensemble prediction scores.
BLAST did not run.
Removing temporary files.
result$`%Rank_EL_netMHCIIpan` %>%
stats::na.omit() %>%
as.numeric()
[1] 71.80 13.58 78.25 19.34 73.63 29.33 84.99 52.69 93.42 80.47 84.82 76.82
[13] 51.54 61.11 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00
[25] 95.00 95.00 95.00 95.00 95.00 95.00 93.27 93.27 93.27 93.27 92.88 92.88
[37] 92.88 92.88 95.00 95.00 95.00 95.00 95.00 95.00 90.87 90.87 95.00 95.00
[49] 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00
[61] 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 89.05 89.05
[73] 89.05 89.05 88.08 88.08 88.08 88.08 95.00 95.00 94.74 94.74 63.72 63.72
[85] 63.72 63.72 63.72 48.82 48.82 48.82 48.82 48.82 95.00 95.00 92.71 92.71
[97] 90.46 90.46 90.46 90.46 90.46 90.46 90.46 90.46 90.46 90.54 90.54 90.54
[109] 90.54 90.54 90.54 90.54 90.54 90.54 95.00 95.00 95.00 95.00 88.45 88.45
[121] 88.45 88.45 92.97 92.97 92.97 92.97 70.80 70.80 70.80 70.80 70.80 70.80
[133] 59.49 59.49 59.49 59.49 59.49 59.49 95.00 95.00 67.99 11.11 51.61 60.64
[145] 13.52 3.31 12.99 3.07 95.00 95.00 95.00 95.00 86.75 86.75 86.75 86.75
[157] 92.02 92.02 92.02 92.02 17.46 2.68 18.19 3.43 95.00 95.00 95.00 95.00
[169] 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 86.91 86.91 86.91 86.91
[181] 86.91 86.91 86.91 86.91 86.91 86.91 86.91 86.91 95.00 95.00 95.00 95.00
[193] 95.00 95.00 95.00 95.00 95.00 95.00 91.25 91.25 91.25 91.25 91.25 91.25
[205] 91.25 91.25 91.25 91.25 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00
[217] 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 77.83 77.83 77.83 77.83
[229] 92.01 92.01 92.01 92.01 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00
[241] 95.00 95.00 95.00 95.00 53.65 6.72 73.78 39.97 95.00 95.00 86.72 86.72
[253] 82.91 82.91 70.53 70.53 95.00 95.00 91.85 91.85 91.25 91.25 79.92 79.92
[265] 79.16 79.16 79.16 79.16 70.20 70.20 70.20 70.20 83.28 83.28 83.28 83.28
[277] 83.28 83.28 83.28 83.28 84.82 84.82 84.82 84.82 84.82 84.82 84.82 84.82
[289] 92.06 92.06 92.06 95.00 95.00 95.00 78.52 78.52 78.52 78.52 78.52 78.52
[301] 78.52 74.41 74.41 74.41 74.41 74.41 74.41 74.41 95.00 95.00 95.00 95.00
[313] 64.69 9.79 84.54 74.01 58.73 6.74 86.73 65.18 95.00 95.00 89.55 89.55
[325] 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 87.48
[337] 87.48 87.48 87.48 87.48 87.48 87.48 87.48 87.48 87.48 87.48 90.46 90.46
[349] 90.46 90.46 90.46 90.46 90.46 90.46 90.46 90.46 90.46 90.46 90.46 82.82
[361] 82.82 82.82 82.82 82.82 82.82 82.82 82.82 82.82 82.82 82.82 82.82 82.82
[373] 95.00 95.00 88.44 88.44 82.60 82.60 82.60 82.60 74.06 74.06 74.06 74.06
[385] 30.83 5.29 33.06 10.00 48.96 5.43 54.68 18.01
https://github.com/andrewrech/antigen.garnish/commit/fb9458b462c2f6d164b301eb14503f6979f8910e
Hi Andrew, thank you very much, I was finally able to run it again after waiting for them to update it on the server. I cannot run your command: dt[, MHC := c("HLA-DQA10102-DQB10501 HLA-DRB1_1101")], because I get the error: Calculating netMHC consensus score. Calculating overall consensus affinity score. Error in get(cols) : invalid first argument Calls: %>% ... merge_predictions -> [ -> [.data.table -> eval -> eval -> get In addition: Warning message: In .Call2("DNAStringSet_translate", x, skip_code, dna_codes[codon_alphabet], : last 2 bases were ignored Removing temporary files. Execution halted
Though if I run it with an MHC I allele first like: dt[, MHC := c("HLA-A*02:01 HLA-DQA10102-DQB10501 HLA-DRB1_1101")], then it works fine. I want to run them separately as I have huge files, do you know why it crashes doing that?
Kind regards Sofia
Sorry - just to be clear you tried the latest commit on Github?
On Jun 16, 2022, at 04:51, SofiaOtero @.***> wrote:
Hi Andrew, thank you very much, I was finally able to run it again after waiting for them to update it on the server. I cannot run your command: dt[, MHC := c("HLA-DQA10102-DQB10501 HLA-DRB1_1101")], because I get the error: Calculating netMHC consensus score. Calculating overall consensus affinity score. Error in get(cols) : invalid first argument Calls: %>% ... merge_predictions -> [ -> [.data.table -> eval -> eval -> get In addition: Warning message: In .Call2("DNAStringSet_translate", x, skip_code, dna_codes[codon_alphabet], : last 2 bases were ignored Removing temporary files. Execution halted
Though if I run it with an MHC I allele first like: dt[, MHC := c("HLA-A*02:01 HLA-DQA10102-DQB10501 HLA-DRB1_1101")], then it works fine. I want to run them separately as I have huge files, do you know why it crashes doing that?
Kind regards Sofia
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.
Sorry I didn't see your response.
Yes we tried the latest commit and it does work in comparison to before, but it crashes if I don't add an HLA-I in the beginning. I have 26 patient files with different HLA's and I can see that it crashes for most of them with the same error as before "netMHCII: command not found", do you think it is because no one has run with those HLA II types before too?
Kind regards Sofia
Hi again,
I have a cohort of 26 patients with different HLA II alleles and many of them crash when I run antigen garnish, I know that they are all available in netMHCiipan. I don't know if it is too much to ask, but could you check if some of them crash when you run it? I have attached a txt file with them all in the correct format to run in antigen garnish.
Kind regards Sofia
I’m happy to check this, please give me a few days.
On Jun 23, 2022, at 09:20, SofiaOtero @.***> wrote:
Hi again,
I have a cohort of 26 patients with different HLA II alleles and many of them crash when I run antigen garnish, I know that they are all available in netMHCiipan. I don't know if it is too much to ask, but could you check if some of them crash when you run it? I have attached a txt file with them all in the correct format to run in antigen garnish.
unique_HLA_II.txt
Kind regards Sofia
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.
Thanks, do you have any updates?
Kind regards Sofia
Sorry @SofiaOtero for the delay.
I fixed the error that was preventing correct parsing of some of the rare alleles for netMHCIIpan from the canonical format.
Every allele on your list now works, which you can test as I did:
library(data.table)
library(magrittr)
library(antigen.garnish)
library(parallel)
HLA <- readLines("unique_HLA_II.txt") %>%
stringr::str_split(" ") %>%
unlist()
dir <- system.file(package = "antigen.garnish") %>%
file.path(., "extdata/testdata")
file <- file.path(dir, "TUMOR.vcf")
# test each allele
ret <- HLA %>% lapply(function(i) {
print(i)
dt %<>% data.table::copy()
dt[, MHC := i]
ret <- try(garnish_affinity(dt))
out <- list(
name = i,
result = ret
)
return(ret)
})
classes <- ret %>% lapply(function(i) {
any(i %>%
class() == "try-error")
}) %>% unlist()
The table of HLA alleles is now printed on stdout also.
https://github.com/andrewrech/antigen.garnish/commit/2dd99ae7def9e80571c9dec6bcb0b8136dd8f62f https://github.com/andrewrech/antigen.garnish/commit/a3dd311db4b09972c66329e4c103bdae8d4480b2
Thank you very much, I have updated the new commit on the Github.
When I run e.g. following HLA's: HLA-DQA10102-DQB10301 HLA-DQA10102-DQB10602 HLA-DQA10505-DQB10301 HLA-DQA10505-DQB10602 HLA-DRB111:01 HLA-DRB115:01
I get this after all variants processed:
And I can see in the ag output directory that both netMHCII and netMHCIIpan have been run but then antigen garnish crashes with following error:
Checking netMHC scripts in antigen.garnish data directory.
Running netMHC in parallel.
/bin/sh: netMHCII: command not found
/bin/sh: netMHCII: command not found
<=> .. (comes several times)
Collating netMHC output...
Read 0 items
Error in data.table::setnames(., dt %>% names(), dtn) :
NA in 'new' at positions [1]
Calls: %>% ... collate_netMHC -> lapply -> FUN -> %>% ->
So it seems lige it cannot run when there are NA values in the netMHCII alleles even though it should just proceed and run netMHCIIpan. Is this also an error you get?
Kind regards Sofia
Hi Sofia,
No, I do not get an error.
Are you sure the paths are configured correctly?
/bin/sh: netMHCII: command not found
/bin/sh: netMHCII: command not found
Seems to indicate that they are not.
So it seems lige it cannot run when there are NA values in the netMHCII alleles even though it should just proceed and run netMHCIIpan.
By design, this should be fine and not generate any errors.
I have looked in your code on how the path should look and my path is: /home/projects/SRHgroup/apps/antigen.garnish/netMHC/netMHCII-2.3
The folder contains:
and I have tested the tool and it works fine when predicting netMHCII inside the folder.
The path to the bin folder is: /home/projects/SRHgroup/apps/antigen.garnish/netMHC/netMHCII-2.3/Linux_x86_64/bin
And the bin folder contains the netMHCII as it should:
Can you tell me if some of the paths are wrong or what the problem then could be?
Kind regards Sofia
Hi, I have been trying to run antigen garnish for a while with your testdata and now it seems to run fine with parallel and netMHC. The issue is that in the folder as e.g. ag_f236b988a09e438ea2 it does not seem like all netMHC's have been run as I only get following amount of files: netMHC_2222f95f-7e2c-4c43_o.csv netMHCpan_5bf2b41a-85f4-453e_o.csv netMHC_29f106ef-a12f-488c_o.csv netMHCpan_71ec0664-1374-4786_o.csv netMHC_60c9a80d-91cf-4c85_o.csv netMHCpan_756ff183-22ac-453e_o.csv netMHC_88957201-b040-4b1c_o.csv netMHCpan_881141cd-e83d-49ab_o.csv netMHC_992253ab-0c44-4986_o.csv netMHCpan_a1a0dfef-606a-4942_o.csv netMHC_b61f7b8f-a6b5-45cc_o.csv netMHCpan_b3d9c7ad-7535-45f9_o.csv netMHC_c6123f47-736a-44f1_o.csv netMHCpan_bff23973-51cc-41e7_o.csv netMHCIIpan_eb376fad-1533-4079_o.csv netMHCpan_cd147a8a-c683-4d5b_o.csv netMHCpan_2ec1102a-b2ce-4c07_o.csv netMHCpan_e0771477-971b-4a0b_o.csv netMHCpan_35980300-2a74-49f9_o.csv netMHCpan_f962f91a-ce37-4784_o.csv netMHCpan_56cb499b-ace7-453e_o.csv netMHCpan_fd56749a-9714-4d38_o.csv
In the netMHC files there are not results from all the lengths and neither all the HLA's, so it seems like not all files needed for antigen garnish have been created as I get following error after 'Running netMHC in parallel.':
Collating netMHC output... Read 74 items Read 79 items Read 84 items Read 89 items Read 94 items Read 99 items Read 103 items Read 83 items Error in data.table::setnames(., dt %>% names(), dtn) : 'old' is length 14 but 'new' is length 1 Calls: %>% ... collate_netMHC -> lapply -> FUN -> %>% ->
In addition: Warning message:
In .Call2("DNAStringSet_translate", x, skip_code, dna_codes[codon_alphabet], :
last 2 bases were ignored
Removing temporary files.
Execution halted
Can you help me to resolve this issue? Thanks in advance.