Open Jokendo-collab opened 4 years ago
Is the geneID part of the protein identifier/description in the .fasta file? If it is, then the different in-file references in the results can direct you back to the protein identifier and description, but I don't know of a specific tool that will provide the full description in a simpler-to-read format.
I have the human database which I downloaded from Uniprot and I am using it to do the database search in my data. My software uses MS-GF+ as a search engine. As I mentioned earlier this is possible with Maxquant because it gives the gene ID and the protein ID column in the protein groups file and I was just wondering if there is a way this can be achieved in MS-GF+. Running two search engines sometimes is boring and it would have been easier for me to just do it once with MS-GF+.
So, this isn't integrated into MS-GF+, but you can use the latest version of the MzidToTsvConverter with the command-line argument -geneid
to add an additional column to the TSV file, where the gene ID is extracted from the protein description using a regular expression. The default regular expression supports the format sp|P08758|ANXA5_HUMAN
and would put ANXA5
in the Gene ID column. You can also supply a different regular expression using -geneid "[regular expression]"
. You can look at the readme for an example.
Is there a way in which I can get the gene ID from the MS-GF+ analysis results? Well I know this information is possible with MaxQuant but I prefer using MS-GF+ owing to its speed in the sequence database search. Kindly advise.
The reason I want these gene information is because I want to use clusterProfile for Gene ontology analysis to determine the Biological processes which are significant in our data