Closed TaoCheng98 closed 2 years ago
Hi Cheng Tao,
thank you for your message and for using the phylostratigraphy script. Based on your description, I cannot reproduce your error.
Did you have a look into the xml-Files (*.xml.tbz) that BLAST generates? And have you extended the header information of each sequence in your fasta files?
Best
Alex
Hi,
Alexander Gabel ,
Thank you for your reply,
I checked the documents you mentioned.
The total number of xml-Files(*.xml.tbz) that you mentioned in my folder is 80 . Is it right?
By the way , I have noticed that the size of the file called "phyloBlastDB.fa_BLAST_PS_tables.tbz" is just 33KB.
It seems that only one protein was recorded in the file .It means that the file contains only information about a protein called YP_178027.
And the protein,called YP_178027,is the only line of output file called 1_phyloBlastDB.fa_final_ps_map.csv.
Best,
Chengtao
Hi,
Alexander Gabel ,
The following information is part of the XML-files you mentioned, and I hope it helps you.
<?xml version="1.0"?>
<!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" "http://www.ncbi.nlm.nih.gov/dtd/NCBI_BlastOutput.dtd">
<BlastOutput>
<BlastOutput_program>blastp</BlastOutput_program>
<BlastOutput_version>BLASTP 2.12.0+</BlastOutput_version>
<BlastOutput_reference>Stephen F. Altschul, Thomas L. Madden, Alejandro A. Sch&auml;ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402.</BlastOutput_reference>
<BlastOutput_db>/home/data/t010208/Chengtao/Phylostratigraphic_analysis/phyloBlastDB/phyloBlastDB.fa</BlastOutput_db>
<BlastOutput_query-ID>Query_1</BlastOutput_query-ID>
<BlastOutput_query-def>YP_178027.1 | [Mycobacterium tuberculosis H3933Rv] | [Bacteria;Actinobacteria;Actinomycetia;Corynebacteriales;Mycobacteriaceae;Mycobacterium;Mycobacterium tuberculosis]</BlastOutput_query-def>
<BlastOutput_query-len>406</BlastOutput_query-len>
<BlastOutput_param>
<Parameters>
<Parameters_matrix>BLOSUM62</Parameters_matrix>
<Parameters_expect>0.001</Parameters_expect>
<Parameters_gap-open>11</Parameters_gap-open>
<Parameters_gap-extend>1</Parameters_gap-extend>
<Parameters_filter>F</Parameters_filter>
</Parameters>
</BlastOutput_param>
<BlastOutput_iterations>
<Iteration>
<Iteration_iter-num>1</Iteration_iter-num>
<Iteration_query-ID>Query_1</Iteration_query-ID>
<Iteration_query-def>YP_178027.1 | [Mycobacterium tuberculosis H3933Rv] | [Bacteria;Actinobacteria;Actinomycetia;Corynebacteriales;Mycobacteriaceae;Mycobacterium;Mycobacterium tuberculosis]</Iteration_query-def>
<Iteration_query-len>406</Iteration_query-len>
<Iteration_hits>
<Hit>
<Hit_num>1</Hit_num>
<Hit_id>gnl|BL_ORD_ID|5243865</Hit_id>
<Hit_def>YP_007353723.1 | [Mycobacterium tuberculosis 7199-99] | [Bacteria; Actinobacteria; Actinobacteria; Actinobacteridae; Actinomycetales; Corynebacterineae; Mycobacteriaceae; Mycobacterium; Mycobacterium tuberculosis complex; Mycobacterium tuberculosis]</Hit_def>
<Hit_accession>5243865</Hit_accession>
<Hit_len>406</Hit_len>
<Hit_hsps>
<Hsp>
<Hsp_num>1</Hsp_num>
<Hsp_bit-score>821.617</Hsp_bit-score>
<Hsp_score>2121</Hsp_score>
<Hsp_evalue>0</Hsp_evalue>
<Hsp_query-from>1</Hsp_query-from>
<Hsp_query-to>406</Hsp_query-to>
<Hsp_hit-from>1</Hsp_hit-from>
<Hsp_hit-to>406</Hsp_hit-to>
<Hsp_query-frame>0</Hsp_query-frame>
<Hsp_hit-frame>0</Hsp_hit-frame>
<Hsp_identity>406</Hsp_identity>
<Hsp_positive>406</Hsp_positive>
<Hsp_gaps>0</Hsp_gaps>
<Hsp_align-len>406</Hsp_align-len>
<Hsp_qseq>MPSPRREDGDALRCGDRSAAVTEIRAALTALGMLDHQEEDLTTGRNVALELFDAQLDQAVRAFQQHRGLLVDGIVGEATYRALKEASYRLGARTLYHQFGAPLYGDDVATLQARLQDLGFYTGLVDGHFGLQTHNALMSYQREYGLAADGICGPETLRSLYFLSSRVSGGSPHAIREEELVRSSGPKLSGKRIIIDPGRGGVDHGLIAQGPAGPISEADLLWDLASRLEGRMAAIGMETHLSRPTNRSPSDAERAATANAVGADLMISLRCETQTSLAANGVASFHFGNSHGSVSTIGRNLADFIQREVVARTGLRDCRVHGRTWDLLRLTRMPTVQVDIGYITNPHDRGMLVSTQTRDAIAEGILAAVKRLYLLGKNDRPTGTFTFAELLAHELSVERAGRLGGS</Hsp_qseq>
<Hsp_hseq>MPSPRREDGDALRCGDRSAAVTEIRAALTALGMLDHQEEDLTTGRNVALELFDAQLDQAVRAFQQHRGLLVDGIVGEATYRALKEASYRLGARTLYHQFGAPLYGDDVATLQARLQDLGFYTGLVDGHFGLQTHNALMSYQREYGLAADGICGPETLRSLYFLSSRVSGGSPHAIREEELVRSSGPKLSGKRIIIDPGRGGVDHGLIAQGPAGPISEADLLWDLASRLEGRMAAIGMETHLSRPTNRSPSDAERAATANAVGADLMISLRCETQTSLAANGVASFHFGNSHGSVSTIGRNLADFIQREVVARTGLRDCRVHGRTWDLLRLTRMPTVQVDIGYITNPHDRGMLVSTQTRDAIAEGILAAVKRLYLLGKNDRPTGTFTFAELLAHELSVERAGRLGGS</Hsp_hseq>
<Hsp_midline>MPSPRREDGDALRCGDRSAAVTEIRAALTALGMLDHQEEDLTTGRNVALELFDAQLDQAVRAFQQHRGLLVDGIVGEATYRALKEASYRLGARTLYHQFGAPLYGDDVATLQARLQDLGFYTGLVDGHFGLQTHNALMSYQREYGLAADGICGPETLRSLYFLSSRVSGGSPHAIREEELVRSSGPKLSGKRIIIDPGRGGVDHGLIAQGPAGPISEADLLWDLASRLEGRMAAIGMETHLSRPTNRSPSDAERAATANAVGADLMISLRCETQTSLAANGVASFHFGNSHGSVSTIGRNLADFIQREVVARTGLRDCRVHGRTWDLLRLTRMPTVQVDIGYITNPHDRGMLVSTQTRDAIAEGILAAVKRLYLLGKNDRPTGTFTFAELLAHELSVERAGRLGGS</Hsp_midline>
</Hsp>
</Hit_hsps>
</Hit>
<Hit>
<Hit_num>2</Hit_num>
<Hit_id>gnl|BL_ORD_ID|5239873</Hit_id>
<Hit_def>YP_005924984.1 | [Mycobacterium tuberculosis RGTB423] | [Bacteria; Actinobacteria; Actinobacteria; Actinobacteridae; Actinomycetales; Corynebacterineae; Mycobacteriaceae; Mycobacterium; Mycobacterium tuberculosis complex; Mycobacterium tuberculosis]</Hit_def>
<Hit_accession>5239873</Hit_accession>
Thank you for your attention to this matter.
Chengtao
Hi,
Alexander Gabel ,
I have checked the XML-files as you suggested.
And I found that the file only contains BLAST information for only one protein.
So the problem could be BLAST.
By the way,my BLAST+ is installed by conda.
Best,
Chengtao
Hi,
Alexander Gabel ,
It is a great honor to use the pipeline for phylostratigraphy that you shared.
However, I have recently had just one problem using your shared pipeline for phylostratigraphy.
In fact, no errors were reported during the process , but my output file contains only one line of results.
I started with proteome of Mycobacterium tuberculosis, which I focused on, but made the error I described above.
Then I used "Acaryochloris Marina MBIC11017" as an example, but the same problem still existed.
I guess that only the last processed protein seems to be recorded in the output file.
This the header of my FASTA-file of the organism :
This is the command: perl createPSmap.pl --organism /home/data/t010208/Chengtao/Phylostratigraphic_analysis/rowdata/Acaryochloris_marina_MBIC11017.fasta --database /home/data/t010208/Chengtao/Phylostratigraphic_analysis/phyloBlastDB/phyloBlastDB.fa --prefix phyloBlastDB.fa --seqOffset 50 --evalue 1e-5 --threads 96 --blastPlus
This is the output file: PS;GeneID 1;NP_214523.1
There is just one line of results,and there doesn't seem to be anything special about this protein, except that it's the last protein in my Fasta-file
The script files are all up to date, the last modification date is 27 Jan 2021.
Your reply is greatly appreciated!
Kind regards,
Cheng Tao