Illumina / BeadArrayFiles

Python library to parse file formats related to Illumina bead arrays
46 stars 33 forks source link

Gentrain score in ClusterFile.py #26

Open Derrup opened 3 years ago

Derrup commented 3 years ago

Hi! It was my understanding that the GenTrain score is "project" dependent and does not come from the cluster file. Is this correct? If so, what does the score calculated in the GenTrain.py module refer to?

If not, am I correct in assuming that, when creating a GenomeStudio project with the same manifest and cluster file, I should always be getting the same GenTrain score for a certain SNP regardless of the samples?

jjzieve commented 3 years ago

Hi @Derrup, the "GenTrain score" on the GenomeStudio interface is the "Cluster Score" encoded into the cluster file (i.e. https://github.com/Illumina/BeadArrayFiles/blob/develop/module/ClusterFile.py#L184). You should get the same values per SNP if using the same manifest and cluster file, regardless of the samples. I would not call that "project dependent" however, because there are other states tied to a project that are saved in GenomeStudio that this library does not take into account (e.g. .bsc files).

I'm not sure what you mean by the "GenTrain.py module" but maybe you meant the GenotypeCalls.py module? That "score" reflects the genotype score encoded in a GTC file (i.e. https://github.com/Illumina/BeadArrayFiles/blob/develop/module/GenotypeCalls.py#L411). In GenomeStudio, on the "Full Data Table" tab, this would be the "Score" per SNP and per sample.