Closed christophgil closed 2 years ago
Hi Christoph,
The path is written to cache file and will be used next time opening FragPipe. You can check the cache folders to see if you have the written permission. The folders can be found by click 'clear cache and close'.
You are right, we don't mark contaminant proteins.
Best,
Fengchao
We use to add Contam_ to contaminants
Felipe, why did we stop doing it?
Sent from my iPhone
On Feb 8, 2021, at 1:40 PM, Fengchao notifications@github.com wrote:
External Email - Use Caution
Hi Christoph,
The path is written to cache file and will be used next time opening FragPipe. You can check the cache folders to see if you have the written permission. The folders can be found by click 'clear cache and close'.
You are right, we don't mark contaminant proteins.
Best,
Fengchao
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/297#issuecomment-775357735, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM62S2QI234UB7RDWCKDS6AV3JANCNFSM4W2E6GNQ.
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
Repeating here what we discussed internally. We dropped the automatic tagging and cleaning of the report tables because, at that time (2017 - 2018), we had collaborators studying some of the proteins that we normally call contaminants. Since the concept of "contaminant" can change from experiment to experiment, people normally remove them while doing their statistical and functional analysis.
Dear Felipe,
That is no problem - I made myself an sed script as a workaround. ... s/^>sp|O76013|KRT36_HUMAN/>sp_cont|O76013|KRT36_HUMAN/1 s/^>sp|O76014|KRT37_HUMAN/>sp_cont|O76014|KRT37_HUMAN/1 s/^>sp|O76015|KRT38_HUMAN/>sp_cont|O76015|KRT38_HUMAN/1 s/^>sp|O77727|K1C15_SHEEP/>sp_cont|O77727|K1C15_SHEEP/1 ...
Is there a good way to obtain the list of contaminants. I did it very stupid by processing a minimal artifical fasta and extracting the entries.
Further I put the body of the uniprot fasta into one single line to allow for "grep -A 1 _HUMAN" and I wonder whether these long lines are compatible with the software or should I better fold the long lines? From the logs it seems that it operated smoothly.
In the GUI Window log panel I cannot Ctrl-F search anything. Can be worked around by pasting everything in a text editor.
Best regards Christoph
On Mon, Feb 8, 2021 at 8:03 PM Felipe Leprevost notifications@github.com wrote:
Repeating here what we discussed internally. We dropped the automatic tagging and cleaning of the report tables because, at that time (2017 - 2018), we had collaborators studying some of the proteins that we normally call contaminants. Since the concept of "contaminant" can change from experiment to experiment, people normally remove them while doing their statistical and functional analysis.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Nesvilab/FragPipe/issues/297#issuecomment-775371417, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASRZU6BIU5ZRTY34PDKFKYLS6AYRNANCNFSM4W2E6GNQ .
Your changes should be fine. You can check here for the whole list as well. https://www.thegpm.org/crap/
Great, thanks!
On Fri, Feb 12, 2021 at 3:14 PM Felipe Leprevost notifications@github.com wrote:
Your changes should be fine. You can check here for the whole list as well. https://www.thegpm.org/crap/
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Nesvilab/FragPipe/issues/297#issuecomment-778220196, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASRZU6H25CQWFWHHC3YVHUTS6UZUFANCNFSM4W2E6GNQ .
Added to v3.6.0
I'm a bit confused about the contaminant annotation in FP output files. According to this closed issue, it seems that, from Philosopher v3.6.0 onward, contaminant proteins should be tagged with Contam_
(just as @anesvi said on Feb 8).
Running the LFQ-MBR workflow in FP v16.0 with Philosopher v4.0.0, there is no clean way to filter-out contaminant proteins from the combined_protein.tsv
output file. I'm right? Is this something still deliberate (as @prvst stated on Feb 8)?
Yes, I reproduce your observation. There is no tag for the contaminant proteins. I will reopen this issue. Felipe @prvst could you please help to resolve this puzzle?
Thanks,
Fengchao
@GianArauz you are saying that the contaminant sequences are there, and the program is not removing them, is that right?
@prvst, exactly. For example:
sp|P00921|CAH2_BOVIN P00921 CAH2_BOVIN CA2 260 18.80 Bos taurus
One would expect a boolean column called Contaminant
or some kind of tag like:
con_sp|P00921|CAH2_BOVIN P00921 CAH2_BOVIN CA2 260 18.80 Bos taurus
The contaminant tag is optional, as the addition of such sequences to the database. The reason is because the contaminants are added in batch, and there are people who actually study some of those proteins. To help deal with these different cases, the tagging can be done by adding the flag --contamprefix
to the database command when annotating the file. This is also why we don't remove them automatically, people might actually want to see what type of contaminant they are hitting.
Also, @GianArauz if you already have your results, and you are working with human samples, something you can quickly do is the removal of hits to organisms that are not human plus keratin.
I managed to get the contam_
tag in the fasta using the --contamprefix
flag. Thanks!
I think that would be nice to have this flag by default when getting the fasta by using the FP GUI: Database
--> Download
--> Add common contaminants
(TRUE) --> OK
.
I'm agree with @prvst that having the contaminants explicitly listed in output file is a must (either because one could be interested on some protein that is usually tagged as contaminant, or just because one needs to track how the "wet" part of the workflow is going on).
In any case, I think that it could be useful to enable the possibility of drop them in a more elegant way (by using the contam_
tag) instead of cherry-picking HUMAN except keratins.
Many thanks for your efficient response! And also for developing/sharing/maintaining FP!
Thanks. We will add an option to FragPipe.
Best,
Fengchao
On Wed, 3 Nov 2021 at 12:19 PM, Gian Arauz @.***> wrote:
I managed to get the contam_ tag in the fasta using the --contamprefix flag. Thanks!
I think that would be nice to have this flag by default when getting the fasta by using the FP GUI: Database --> Download --> Add common contaminants (TRUE) --> OK.
I'm agree with @prvst https://github.com/prvst that having the contaminants explicitly listed in output file is a must (either because one could be interested on some protein that is usually tagged as contaminant, or just because one needs to track how the "wet" part of the workflow is going on).
In any case, I think that it could be useful to enable the possibility of drop them in a more elegant way (by using the contam_ tag) instead of cherry-picking HUMAN except keratins.
Many thanks for your efficient response! And also for developing/sharing/maintaining FP!
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/Nesvilab/FragPipe/issues/297#issuecomment-959620359, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABU27WZR7AIGJRUOGQ6TQEDUKFOKFANCNFSM4W2E6GNQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
-- Dr. Fengchao Yu Research Investigator University of Michigan
Thanks for the kind words. I'll discuss some possible changes with my colleagues. Cheers
Added in version 17.1: https://github.com/Nesvilab/FragPipe/releases/tag/17.1
Dear Dr. Fengchao,
A minor problem with GUI: the file selector box for the fasta file does not remember the last directory.
Further I have a question regarding contaminants. I generated the .fas file with contaminants in fragpipe. In the output files the revers decoys are recognized by a prefix and can easily filtered out in R. I was expecting a similar mechanism to filter out the contaminants but looking at an output line with a keratin from sheep as an example, I do not see an indication that it is a contaminant other than that it is not human.
In Maxquant the contaminants have a leading "CONT_" Thanks Christoph
On Sat, Jan 30, 2021 at 2:48 PM Fengchao notifications@github.com wrote: