andrewrech / antigen.garnish

Other
45 stars 13 forks source link

running just foreignness leads to warnings/errors #138

Closed boyangzhao closed 3 years ago

boyangzhao commented 3 years ago

I'm running your docker and trying to just run the foreignness and dissimilarity given a list of sequences, but seems to be getting warnings (or errors). I'm expecting just a table as output, but doesn't appear to be so.

Docker used: andrewrech/antigen.garnish:latest After starting up the docker, docker run -it andrewrech/antigen.garnish /bin/bash and start R, when I run follow the example in the readme,

library(magrittr)
library(data.table)
library(antigen.garnish)
v <- c("SIINFEKL", "ILAKFLHWL", "GILGFVFTL")
v %>% foreignness_score(db = "human") %>% print

The response I get is,

Checking netMHC scripts in antigen.garnish data directory.
/root/antigen.garnish/netMHC/netMHC-4.0/netMHC does not exist; cannot configure netMHC tools; see README: https://github.com/andrewrech/antigen.garnish
/root/antigen.garnish/netMHC/netMHCII-2.3/netMHCII-2.3 does not exist; cannot configure netMHC tools; see README: https://github.com/andrewrech/antigen.garnish
/root/antigen.garnish/netMHC/netMHCIIpan-4.0/netMHCIIpan does not exist; cannot configure netMHC tools; see README: https://github.com/andrewrech/antigen.garnish
/root/antigen.garnish/netMHC/netMHCpan-4.1/netMHCpan does not exist; cannot configure netMHC tools; see README: https://github.com/andrewrech/antigen.garnish
netMHC  is not in PATH
       Download: http://www.cbs.dtu.dk/services/
netMHCII-2.3  is not in PATH
       Download: http://www.cbs.dtu.dk/services/
netMHCIIpan  is not in PATH
       Download: http://www.cbs.dtu.dk/services/
netMHCpan  is not in PATH
       Download: http://www.cbs.dtu.dk/services/
Generating FASTA to query.
Running blastp for homology to IEDB antigens.
Summing IEDB local alignments...
Removing temporary fasta files.
        nmer foreignness_score
1: GILGFVFTL                 1
                                                                                                                                                       IEDB_anno
1: 20355|Matrix protein 1|P03485.1|Influenza A virus (A/Puerto Rico/8/1934(H1N1))|211044 GILGFVFTL|20354|M1 protein|CAA30882.1|Influenza A virus|11320 GILGFVFTL

If I run result <- foreignness_score(v, db = "human"), the result looks like,

        nmer foreignness_score
1: GILGFVFTL                 1
                                                                                                                                                       IEDB_anno
1: 20355|Matrix protein 1|P03485.1|Influenza A virus (A/Puerto Rico/8/1934(H1N1))|211044 GILGFVFTL|20354|M1 protein|CAA30882.1|Influenza A virus|11320 GILGFVFTL

instead of a table of two columns (nmer and foreignness_score).

I don't wish to run with vcf and predict binding. If I just want to run the foreignness/dissimilarity scores, do I still need to install all the netMHC tools? Does it resolve the issue above?

andrewrech commented 3 years ago

Thank you for interest in the software.

The warning is unrelated to this issue you have raised, but I agree that those tools should not be checked in this function and I have just removed the check on main.

Can you please try our test case?

library(magrittr)
library(data.table)
library(antigen.garnish)
v <- c("SIINFEKL", "ILAKFLHWL", "GILGFVFTL")
v %>% foreignness_score(db = "mouse") %>% print

I believe the software is working correctly. Peptides that return scores of NA are dropped, and SIINFEKL, for instance, would not be expected to return a score in humans.

@boyangzhao

boyangzhao commented 3 years ago

Thanks, if I run this test case within the docker (docker into bash), I confirm it's working, with three columns (nmer, foreigness_score, and IEDB_anno) and the three peptides. However, if I were to run this locally (using the latest antigen.garnish and installation of blast), or using the docker but called via a cwl-engine, the result I get is

Generating FASTA to query.
Running blastp for homology to IEDB antigens.
Removing temporary fasta files.
        nmer
1:  SIINFEKL
2: ILAKFLHWL
3: GILGFVFTL

There is no warning/error messages so not sure if something else is going on that it prevents it from outputting the other columns? It looks like it's missing the Summing IEDB local alignments... step

andrewrech commented 3 years ago

Sorry this is still causing trouble. @leeprichman and I were just chatting about this. This code path can exit here with a warning, here with a warning, or here with an error due to the column length being wrong. None of which seems to be occurring for you.

Is it possible a warning is being suppressed? On local, could you please

debug(foreignness_score)

and then step through the function? Please paste the output here.

I am not sure what is going on under cwl-engine but let's start with the simpler case.

boyangzhao commented 3 years ago

Thanks for the speedy response. I've tried the debug, and figured out that I didn't download your http://get.rech.io/antigen.garnish-2.2.0.tar.gz to install the BLAST databases. I downloaded that and defined AG_DATA_DIR, it's now working!

It was strange that the warning about BLAST database cannot be found was never displayed.

andrewrech commented 3 years ago

Yea that is odd. I'm not sure. I am glad it is working. Please re-open if you run into other issues!

boyangzhao commented 3 years ago

When you mentioned Peptides that return scores of NA are dropped,, in what instances would the result be NA, for foreignness and dissimilarity? And how should we interpret this?

One last thing, FYI, for the errors related to cwl-engine, the dockers are run as non-root, while the Docker made available was created with root and would face permission issues in accessing the /root/antigen.garnish folder. I found a workaround in the meantime. Thanks! and yes the issue is considered resolved.

leeprichman commented 3 years ago

Hi Boyang! If you are directly passing a vector of peptides to foreignness_score or dissimilarity, if a peptide does not return, it is because that sequence did not have any suitable blast alignments. In both cases, this is equivalent to a score of 0.

When run as part of the whole prediction function, this would get merged back to the table so the peptide would not be lost and NAs converted to 0s. Does that make sense?

boyangzhao commented 3 years ago

Ok! Make sense. Thanks!

boyangzhao commented 3 years ago

Actually, sorry Lee, one last question regarding the NAs. For foreignness I get, but isn't it for dissimilarity, if the peptide does not have any suitable blast alignments, that it would be so dissimilar that it would then have a score of 1? e.g. otherwise an out of frame indel generate novel peptides that wouldn't match any reference proteome would have a dissimilarity of 0? In cases of poor alignments, that I get, resulting in a higher dissimilarity.

leeprichman commented 3 years ago

No problem! Yes, thats a very astute observation. That said, this shouldn't happen in the setting of SNVs because the entire rest of the sequence should align. It's possible that a non-human source or frameshift could create a sequence with no alignments however, given the size of the blast database and the permissibility of the blast parameters, I have not yet seen this happen. It's also possible that a low complexity sequence such as AAAAAAAAA could fall in this category, but such sequences would be unlikely to be immunogenic. Essentially, this is an unvalidated edge case and we erred on the side of not calling them dissimilar.