Nesvilab / FragPipe

A cross-platform proteomics data analysis suite
http://fragpipe.nesvilab.org
Other
184 stars 37 forks source link

Bad Fasta header broke FragPipe #1747

Closed cctortecka closed 2 weeks ago

cctortecka commented 2 weeks ago

Hi,

I'm having issues with converting Fragger results to a spectral library. I've successfully searched my data with the expected results, but at the last stage of library conversion I repeatedly encounter the attached error. I've tried a couple of different setups on our end, but unfortunately haven't been able to resolve this. We're running FragPipe on our Terra implementation (https://github.com/broadinstitute/PANOPLY/tree/dev/third-party-modules/panoply_fragpipe); if that helps.

Could you please have a look at this? Thanks, Claudia

log_2024-08-18_22-08-26.txt

fcyu commented 2 weeks ago

The protein header in your fasta file does not following any standard format. If you want to make it looks like UniProt, the format should be sp|XXX|XXX XXX while yours have

>sp|neoAg_nuORF_p002_G7_nuORF_p002_nuORF__389
>sp|neoAg_nuORF_p002_G8_nuORF_p002_nuORF__277
>sp|neoAg_nuORF_p002_G9_nuORF_p002_nuORF__166
>sp|neoAg_nuORF_p002_G9_nuORF_p002_nuORF__955

Also, please upgrade the tools to the latest version. You are using very old versions. Some of the bugs might have been fixed.

Best,

Fengchao

cctortecka commented 2 weeks ago

Thanks Fengchao - by just repeating the identifier I was able to successfully generate the spectral library. Are there any common requirements for how the FASTA has to be generated for the different modules?

Claudia

fcyu commented 2 weeks ago

The most ideal format is the UniProt, sp|XXX|XXX XXX, but others such as NCBI, ENSEMBL, and generic ones should also work. Here (https://github.com/Nesvilab/philosopher/wiki/How-to-Prepare-a-Protein-Database) has a brief document about it.

FragPipe also tries to parse the gene and organism information from the protein header, but if there is no such info, the program won't crash (let me know if you see any error messages).

Best,

Fengchao

anesvi commented 2 weeks ago

We can send you our reformatted version of the Broad database

Get Outlook for iOShttps://aka.ms/o0ukef


From: Fengchao @.> Sent: Friday, August 23, 2024 2:02:11 PM To: Nesvilab/FragPipe @.> Cc: Subscribed @.***> Subject: Re: [Nesvilab/FragPipe] Bad Fasta header broke FragPipe (Issue #1747)

External Email - Use Caution

The most ideal format is the UniProt, sp|XXX|XXX XXX, but others such as NCBI, ENSEMBL, and generic ones should also work. Here (https://github.com/Nesvilab/philosopher/wiki/How-to-Prepare-a-Protein-Database) has a brief document about it.

FragPipe also tries to parse the gene and organism information from the protein header, but if there is no such info, the program won't crash (let me know if you see any error messages).

Best,

Fengchao

— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/1747#issuecomment-2307563064, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM65IH26JX67YNXSGKYTZS52KHAVCNFSM6AAAAABNAOOQPCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBXGU3DGMBWGQ. You are receiving this because you are subscribed to this thread.Message ID: @.***>


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

cctortecka commented 2 weeks ago

Ok - that's very helpful thanks. @anesvi that would be great, then we can adapt our headers to your requirements.

Thanks, Claudia

fcyu commented 2 weeks ago

Communicated by email.