Illumina / BeadArrayFiles

Python library to parse file formats related to Illumina bead arrays
46 stars 34 forks source link

Need further clarifications on the return values of GenotypeCalls.py module #21

Open plusmid opened 4 years ago

plusmid commented 4 years ago

Hi team, I have some questions on the "get_control_x_intensities()" method.

  1. It returns a numpy array with a length of 23. What are these 23 numbers standing for? We can always see the control types and their values on GenomeStudio's control dashboard, such as DNP (High) | DNP (Bgnd) | Biotin (High) | Biotin (Bgnd) | Extension (A) | Extension (T) and so on.
  2. Dose x_intensity stand for red channel values and y_intensity for green?
  3. MethylationEpic Array has more control types than other genotyping arrays, dose it mean that it'll return a 23+ length of numpy array when apply get_control_x_intensities() to methylation array gtcs? If yes, could you specify their corresponding control types?
  4. Are these return values the same as what we can see in GenomeStudio's control dashboard?

See below what I pulled out from 1 MethylationEpic gtc, Thanks in advance!

from IlluminaBeadArrayFiles import GenotypeCalls, BeadPoolManifest, code2genotype import sys gtc_file=r"C:\202309880087_R01C02.gtc" GenotypeCalls(gtc_file).get_control_x_intensities()

array([22163, 1337, 1341, 1025, 45830, 48170, 3044, 4004, 1773, 1749, 1357, 2127, 1042, 1086, 1287, 972, 1075, 1241, 1435, 1055, 1209, 874, 1398], dtype=uint16)

GenotypeCalls(gtc_file).get_control_y_intensities()

array([ 1412, 1090, 17303, 775, 1341, 912, 32024, 31040, 741, 23807, 16977, 8654, 776, 721, 743, 684, 737, 758, 1904, 833, 1092, 562, 2295], dtype=uint16)

len(GenotypeCalls(gtc_file).get_control_y_intensities())

23

jjzieve commented 4 years ago

Hi @plusmid, sorry for the late response. Can you contact techsupport@illumina.com about these questions if you haven't done so? I don't know enough about the methylation GTCs to give satisfactory answers to all your questions.

lefebvrf commented 4 years ago

Hi @jjzieve , in my case: len(gtc.get_control_y_intensities()) prints 92 with 4x repeated values array([ 598, 598, 598, 598, 120, 120, 120, 120, 32832, 32832, 32832, 32832, 176, 176, 176, 176, 4271, 4271, 4271, 4271, 2853, 2853, 2853, 2853, 36280, 36280, 36280, 36280, 38271, 38271, 38271, 38271, 188, 188, 188, 188, 142, 142, 142, 142, 121, 121, 121, 121, 476, 476, 476, 476, 129, 129, 129, 129, 137, 137, 137, 137, 213, 213, 213, 213, 124, 124, 124, 124, 154, 154, 154, 154, 114, 114, 114, 114, 200, 200, 200, 200, 100, 100, 100, 100, 119, 119, 119, 119, 489, 489, 489, 489, 198, 198, 198, 198], dtype=uint16)

I am parsing a GTC generated with the lastest iaap-cli based on manifest GSA-24v3-0_A1.bpm.

I'm also not clear on how to relate the output of get_control_y_intensities() to the manifest bead probes (manifest.control_config), could one assume they are in the same order?

Should I contact tech support about this?

jjzieve commented 4 years ago

@lefebvrf The GTC description describes this a bit: https://github.com/Illumina/BeadArrayFiles/blob/develop/docs/GTC_File_Format_v5.pdf I don't know off the top of my head how many controls GSA has I would reach out to techsupport about that