SATAY-LL / LaanLab-SATAY-DataAnalysis

This contains codes and workflows for data analysis regarding SATAY experiments.
Apache License 2.0
4 stars 3 forks source link

Gene name aliases #37

Open Gregory94 opened 3 years ago

Gregory94 commented 3 years ago

There has been some confusion with gene names between datasets that are generated with our workflow and gene names from files created by others. This is most likely caused by the use of different naming conventions or gene aliases (i.e. the same gene can have multiple names).

One of the differences in gene names are between _pergene.txt files from the workflow of the Kornmann lab and from our workflow. I have checked the differences between the two files using this python script. This takes two _pergene.txt files as input and for each file creates a list of all gene names present in that file. It then looks for all genes that are in one list but not the other and vice versa.

I saw that there are 80 genes that are different between the Kornmann files and our files. I checked all genes and I noticed that sometimes they were using either a different naming convention for genes (e.g. we use MRX3 whilst they use YBL095W which are two names for the same gene) or they used an alias (e.g. we use BOL3 whilst they used AIM1, again both referring to the same gene).

Just be aware when comparing data files from different sources that include gene names, that there might be differences in the names for the same genes.

Solving this issue can be done using the Yeast_Protein_Names.txt file that stores all different names for the genes. Alternatively you can use the genomicfeatures_dataframe.py script that creates a python dataframe including, for each gene its aliased and different naming conventions (it is also using the Yeast_Protein_Names.txt file).

Gregory94 commented 3 years ago

Important note when using Yeast_Protein_Names.txt. There has been a major update concerning gene names and aliases in Yeast_Protein_Names.txt. More gene names are present and some genes have updated aliases. This has been updated on the master branch.