Closed JAnu1291 closed 1 day ago
I don't know the reason for this problem, but something doesn't seem right with your 2024-11-07-decoys-contam-combined_protein_database_allspecies.fasta.fas
. Could you share both fasta file with us?
Thanks,
Fengchao
Here are the fasta files: database.zip
What you shared have different file name as those in the log file
Best,
Fengchao
Sorry for that.
I am sharing link to the Dropbox folder containing both the database.
In the bigger database, there are 26 proteins without PE=
keyword. Not sure if it triggered some bugs, but could you add PEs and try again?
>rev_sp|P00452-2|RIR1_ECOLI Isoform Alpha' of Ribonucleoside-diphosphate reductase 1 subunit alpha OS=Escherichia coli (strain K12) OX=83333 GN=nrdA
>rev_sp|P02919-2|PBPB_ECOLI Isoform Gamma of Penicillin-binding protein 1B OS=Escherichia coli (strain K12) OX=83333 GN=mrcB
>rev_sp|P06710-2|DPO3X_ECOLI Isoform gamma of DNA polymerase III subunit tau OS=Escherichia coli (strain K12) OX=83333 GN=dnaX
>rev_sp|P07363-2|CHEA_ECOLI Isoform cheA(S) of Chemotaxis protein CheA OS=Escherichia coli (strain K12) OX=83333 GN=cheA
>rev_sp|P0A705-2|IF2_ECOLI Isoform Beta of Translation initiation factor IF-2 OS=Escherichia coli (strain K12) OX=83333 GN=infB
>rev_sp|P0A705-3|IF2_ECOLI Isoform Beta' of Translation initiation factor IF-2 OS=Escherichia coli (strain K12) OX=83333 GN=infB
>rev_sp|P0A988-2|DPO3B_ECOLI Isoform Beta* of Beta sliding clamp OS=Escherichia coli (strain K12) OX=83333 GN=dnaN
>rev_sp|P0DSH3-2|YIBX_ECOLI Isoform Ybix-S of Protein YibX OS=Escherichia coli (strain K12) OX=83333 GN=yibX
>rev_sp|P15005-2|MCRB_ECOLI Isoform 33 kDa of Type IV methyl-directed restriction enzyme EcoKMcrB subunit OS=Escherichia coli (strain K12) OX=83333 GN=mcrB
>rev_sp|P40695-2|PLCR_PSEAE Isoform plcR2 of Phospholipase C accessory protein PlcR OS=Pseudomonas aeruginosa (strain ATCC 15692 / DSM 22644 / CIP 104116 / JCM 14847 / LMG 12228 / 1C / PRS 101 / PAO1) OX=208964 GN=plcR
>rev_sp|P63284-2|CLPB_ECOLI Isoform ClpB-3 of Chaperone protein ClpB OS=Escherichia coli (strain K12) OX=83333 GN=clpB
>rev_sp|P75960-2|NPD_ECOLI Isoform CobB-Short of NAD-dependent protein deacylase OS=Escherichia coli (strain K12) OX=83333 GN=cobB
>rev_sp|Q59385-2|COPA_ECOLI Isoform Soluble copper chaperone CopA(Z) of Copper-exporting P-type ATPase OS=Escherichia coli (strain K12) OX=83333 GN=copA
>sp|P00452-2|RIR1_ECOLI Isoform Alpha' of Ribonucleoside-diphosphate reductase 1 subunit alpha OS=Escherichia coli (strain K12) OX=83333 GN=nrdA
>sp|P02919-2|PBPB_ECOLI Isoform Gamma of Penicillin-binding protein 1B OS=Escherichia coli (strain K12) OX=83333 GN=mrcB
>sp|P06710-2|DPO3X_ECOLI Isoform gamma of DNA polymerase III subunit tau OS=Escherichia coli (strain K12) OX=83333 GN=dnaX
>sp|P07363-2|CHEA_ECOLI Isoform cheA(S) of Chemotaxis protein CheA OS=Escherichia coli (strain K12) OX=83333 GN=cheA
>sp|P0A705-2|IF2_ECOLI Isoform Beta of Translation initiation factor IF-2 OS=Escherichia coli (strain K12) OX=83333 GN=infB
>sp|P0A705-3|IF2_ECOLI Isoform Beta' of Translation initiation factor IF-2 OS=Escherichia coli (strain K12) OX=83333 GN=infB
>sp|P0A988-2|DPO3B_ECOLI Isoform Beta* of Beta sliding clamp OS=Escherichia coli (strain K12) OX=83333 GN=dnaN
>sp|P0DSH3-2|YIBX_ECOLI Isoform Ybix-S of Protein YibX OS=Escherichia coli (strain K12) OX=83333 GN=yibX
>sp|P15005-2|MCRB_ECOLI Isoform 33 kDa of Type IV methyl-directed restriction enzyme EcoKMcrB subunit OS=Escherichia coli (strain K12) OX=83333 GN=mcrB
>sp|P40695-2|PLCR_PSEAE Isoform plcR2 of Phospholipase C accessory protein PlcR OS=Pseudomonas aeruginosa (strain ATCC 15692 / DSM 22644 / CIP 104116 / JCM 14847 / LMG 12228 / 1C / PRS 101 / PAO1) OX=208964 GN=plcR
>sp|P63284-2|CLPB_ECOLI Isoform ClpB-3 of Chaperone protein ClpB OS=Escherichia coli (strain K12) OX=83333 GN=clpB
>sp|P75960-2|NPD_ECOLI Isoform CobB-Short of NAD-dependent protein deacylase OS=Escherichia coli (strain K12) OX=83333 GN=cobB
>sp|Q59385-2|COPA_ECOLI Isoform Soluble copper chaperone CopA(Z) of Copper-exporting P-type ATPase OS=Escherichia coli (strain K12) OX=83333 GN=copA
Thanks,
Fengchao
I corrected this and the IDs are still low
Then, could you share your mzML or raw files, workflow file, and fasta file with me to debug?
Thanks,
Fengchao
Thanks for sharing the files. Now, I am almost sure that something is wrong with your large database 2024-10-30-decoys-contam-combined_protein_database_allspecies.fasta.fas
.
For example, the small database has
>sp|A6NM66|CU054_HUMAN Uncharacterized protein encoded by LINC01548 OS=Homo sapiens OX=9606 GN=LINC01548 PE=1 SV=1
MLAKGAEEGRSGGPRPAITLPGSLHFTCDLKTSPYCLTRAELMEHLPLRVAVHSMSPCHRSCFCGELKRGHPWNTPQVSSFPSSTTSLSHSCTTSHLDCSQQVESGSK
but the large database has the same protein with a different sequence
>sp|A6NM66|CU054_HUMAN Uncharacterized protein encoded by LINC01548 OS=Homo sapiens OX=9606 GN=LINC01548 PE=1 SV=1
MAKGAGRSGGRATGSHTCDKTSYCTRAMHRVAVHSMSCHRSCCGKRGHWNTVSSSSTTSSHSCTTSHDCSVSGSK
I also did another test. I searched your data with the combination of the small and large databases, and get similar numbers of PSMs, peptides, and proteins compare to those from searching against the small database. It means that the difference between the results from searching against the small and large database is not due to difference database sizes, but due to the large database itself.
Best,
Fengchao
We are analyzing DDA-PASEF data using Fragpipe version 22 (MSFragger 4.1). We are interested in identifying bacterial proteins, so searching the raw data against database containing protein sequences from human, 23 bacterial species and 1 fungal species both with and without groupFDR option. Surprisingly, the number of proteins identifications drops to ~80 when searched against this combined database. However, when searched only against human and one bacterial species database, the number of identifications are around ~1800.
I am wondering if this could be due to a FDR issue and unable to figure what the next steps would be to make it normal. I am attaching here the log files from the searches with human+1 bacterial species database and human+23 bacterial species database.
I appreciate your help here.
log_file_search_against_human+single_bacteria.txt log_file_human+23_bacterial_species.txt