Closed ypriverol closed 5 years ago
@yafeng @husensofteng I have added the decoy tool. The way to test it is using the command:
proteomicsdevmbpr1:pypgatk yperez$ python3.7 pypgatk_cli.py generate-decoy -h
Usage: pypgatk_cli.py generate-decoy [OPTIONS]
Options:
-c, --config_file TEXT Configuration file for the protein database
decoy generation
-o, --output TEXT Output file for decoy database
-i, --input TEXT FASTA file of target proteins sequences for
which to create decoys (*.fasta|*.fa)
-s, --cleavage_sites TEXT A list of amino acids at which to cleave
during digestion. Default = KR
-a, --anti_cleavage_sites TEXT A list of amino acids at which not to cleave
if following cleavage site ie. Proline.
Default = none
-p, --cleavage_position TEXT Set cleavage to be c or n terminal of
specified cleavage sites. Options [c, n],
Default = c
-l, --min_peptide_length INTEGER
Set minimum length of peptides to compare
between target and decoy. Default = 5
-n, --max_iterations INTEGER Set maximum number of times to shuffle a
peptide to make it non-target before
failing. Default=100
-x, --do_not_shuffle TEXT Turn OFF shuffling of decoy peptides that
are in the target database. Default=false
-w, --do_not_switch TEXT Turn OFF switching of cleavage site with
preceding amino acid. Default=false
-d, --decoy_prefix TEXT Set accession prefix for decoy proteins in
output. Default=DECOY_
-t, --temp_file TEXT Set temporary file to write decoys prior to
shuffling. Default=protein-decoy.fa
-b, --no_isobaric TEXT Do not make decoy peptides isobaric.
Default=false
-m, --memory_save TEXT Slower but uses less memory (does not store
decoy peptide list). Default=false
-h, --help Show this message and exit.
@yafeng We need to take a decision about the DECOY Ids
. The way the tool generate the ids is by creating a new protein with the following accession DECOY_1
, DECOY_2
... Can you check this works in our pipelines? The second problem I see is that we lost all the information related with the proteins, My guess is that we will recover that information in the re-mapping after the protein identification step?
@ypriverol I have modified the script so that the output decoy sequences contain original protein ID, this is required to distinguish decoys from different classes.
I will chec kthe code because it doesn't look like it will work now. LEt me check
@yafeng I just updated the code and it works fine now. I removed also the from the concatenation, that means that the decoy prefix should contain the `` itself. Can you test now the current version? If you are happy with that I can close this issue.
@ypriverol this is first time I tried, i got this error. Do I forget to setup something?
Traceback (most recent call last):
File "pypgatk/pypgatk_cli.py", line 11, in
It works here. @yafeng that is weird may be you should run the setup.py script with install command
The method to use the library now is you need to install it first, as:
python3.7 setup.py install
The current tool is more a library with a commandline tool than simple scripts that is why you need to install it. @enriquea if you succeed can you improve the README.
Please update the README, the following python packages needs to be installed.
pip install click
pip install PyVCF
pip install gffutils
pip install pyyaml
pip install biopython
This packages are added in the requirements.txt you should be able to install then using pip
.
pip install -r requirements.txt
did not give any error.
The protocol to build the package should be:
1 -
pip install -r requirements.txt
2-
python3.7 setup.py install
then you should be able to run the script.
@yafeng @enriquea did you manage to build the package ?
I'm getting the following error:
enrique$ python3.6 pypgatk_cli.py --help
Traceback (most recent call last):
File "pypgatk_cli.py", line 14, in <module>
from pypgatk.commands import cosmic_to_proteindb as cosmic_to_proteindb_cmd
File "/anaconda3/lib/python3.6/site-packages/pypgatk-0.0.1-py3.6.egg/pypgatk/commands/cosmic_to_proteindb.py", line 3, in <module>
from pypgatk.cgenomes.cgenomes_proteindb import CancerGenomesService
File "/anaconda3/lib/python3.6/site-packages/pypgatk-0.0.1-py3.6.egg/pypgatk/cgenomes/cgenomes_proteindb.py", line 3, in <module>
from Bio import SeqIO
ModuleNotFoundError: No module named 'Bio'
@enriquea you need to install the biopython package as noted above I will update the docs to list the requirements.
@yafeng If the decoy tool work for you please close this issue.
We need to add the decoy Sanger tool to the library. The tool is the following:
https://www.sanger.ac.uk/science/tools/decoypyrat