Nesvilab / philosopher

PeptideProphet, PTMProphet, ProteinProphet, iProphet, Abacus, and FDR filtering
https://philosopher.nesvilab.org
GNU General Public License v3.0
111 stars 19 forks source link

Questions about contaminants which are used in proteomic searches #510

Open RobAlbn opened 3 hours ago

RobAlbn commented 3 hours ago

I am running proteomic searches with FragPipe v22.0, and I have some questions about contaminants.

After adding decoys and contaminants to my protein database, I removed proteins of the starting database and decoys. In this way, I obtained a FASTA file of contaminants ("contaminants.fasta"). Then, I downloaded the cRAP database from this link: ftp://ftp.thegpm.org/fasta/cRAP/crap.fasta. However, while my FASTA file ("contaminants.fasta") contains 118 proteins, the cRAP FASTA file ("crap.fasta") contains 116 proteins. Both files are attached as text files. Why are there differences between these two FASTA files? In general, could you provide the FASTA file of contaminants that are added by FragPipe?

After adding decoys and contaminants, in the resulting FASTA file headers of contaminants do not start with "contam" or a similar prefix. Can I manually add "contam" or a similar prefix to headers of contaminants before running a proteomic search? Could this affect the search results?

Finally, when running Fragpipe in headless mode, which database should I specify in the workflow file, i.e., the database with decoys and contaminants or the database without decoys and contaminants? In other words, when running Fragpipe in headless mode, does it automatically add decoys and contaminants to the database that is specified in the workflow file?

Thank you for any help and support on this.

Best regards, Roberto Albanese

contaminants.txt crap.txt

fcyu commented 3 hours ago

After adding decoys and contaminants to my protein database, I removed proteins of the starting database and decoys. In this way, I obtained a FASTA file of contaminants ("contaminants.fasta"). Then, I downloaded the cRAP database from this link: ftp://ftp.thegpm.org/fasta/cRAP/crap.fasta. However, while my FASTA file ("contaminants.fasta") contains 118 proteins, the cRAP FASTA file ("crap.fasta") contains 116 proteins. Both files are attached as text files. Why are there differences between these two FASTA files? In general, could you provide the FASTA file of contaminants that are added by FragPipe?

Maybe @AimeeD90 can take a look at this one

After adding decoys and contaminants, in the resulting FASTA file headers of contaminants do not start with "contam_" or a similar prefix.

As far as I know, adding the contam_ prefix only works when you download the database using Philosopher.

Can I manually add "contam_" or a similar prefix to headers of contaminants before running a proteomic search? Could this affect the search results?

Yes, you can.

Finally, when running Fragpipe in headless mode, which database should I specify in the workflow file, i.e., the database with decoys and contaminants or the database without decoys and contaminants? In other words, when running Fragpipe in headless mode, does it automatically add decoys and contaminants to the database that is specified in the workflow file?

Specify the one with targets, decoys, and contaminants.

Best,

Fengchao