Open brianjohnhaas opened 4 years ago
Dear Brian, dear Trinotate users,
I am using sigP 5.0 on Trinotate v3.2.0. Here some issues I have encountered, in case it can help others. I am not a programmer so probably there are more elegant ways to solve them:
First, I needed to shorten the protein headers of the protein fasta file (transdecoder output; e.g., protein.fasta) to have only the protein ID. Otherwise, there were errors due to invalid characters:
awk -F " " '/^>/ {print $1; next} 1' protein.fasta > sig_v5.input.protein.fasta
Then, I was not able to analyze the whole transcriptome at once, despite trying with different values in the new --batch parameter. The solution was to split the transcriptome with the fasta-splitter script developed by Kirill Kryukov http://kirill-kryukov.com/study/tools/fasta-splitter/. A division into files of 100000 sequences worked for me:
perl fasta-splitter.pl sig_v5.input.protein.fasta --part-size 100000 --measure count
That value, 100000, was the one I used in the --batch parameter for signalP. The results can be easily concatenated later.
terrific! thanks for contributing this!
On Fri, Nov 15, 2019 at 4:21 AM LuciaPita notifications@github.com wrote:
Dear Brian, dear Trinotate users,
I am using sigP 5.0 on Trinotate v3.2.0. Here some issues I have encountered, in case it can help others. I am not a programmer so probably there are more elegant ways to solve them:
First, I needed to shorten the protein headers of the protein fasta file (transdecoder output; e.g., protein.fasta) to have only the protein ID. Otherwise, there were errors due to invalid characters:
awk -F " " '/^>/ {print $1; next} 1' protein.fasta > sig_v5.input.protein.fasta
Then, I was not able to analyze the whole transcriptome at once, despite trying with different values in the new --batch parameter. The solution was to split the transcriptome with the fasta-splitter script developed by Kirill Kryukov http://kirill-kryukov.com/study/tools/fasta-splitter/. A division into files of 100000 sequences worked for me:
perl fasta-splitter.pl sig_v5.input.protein.fasta --part-size 100000 --measure count
That value, 100000, was the one I used in the --batch parameter for signalP. The results can be easily concatenated later.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Trinotate/Trinotate.github.io/issues/31?email_source=notifications&email_token=ABZRKX2NDPHZEAPQTLYNFFLQTZS2BA5CNFSM4JK5W7QKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEE2LSA#issuecomment-554280392, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKX33AAMS6PDBYVZZLOTQTZS2BANCNFSM4JK5W7QA .
Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas
do it