broadinstitute / lincs-cell-painting

Processed Cell Painting Data for the LINCS Drug Repurposing Project
BSD 3-Clause "New" or "Revised" License
25 stars 13 forks source link

Channel order #85

Closed wongdanr closed 2 years ago

wongdanr commented 2 years ago

Hello! Where can I find which markers/stains each of the 5 image channels correspond to?

gwaybio commented 2 years ago

Hi @wongdanr - I'll point you to a couple resources:

To summarize:

Channel Stain Structure ID
1 HOECHST 33342 DNA ch1
2 Alexa 488 ER ch2
3 488 long RNA ch3
4 Alexa 568 AGP ch4
5 Alexa 647 Mito ch5
bethac07 commented 2 years ago

@gwaygenomics Has someone (possibly past me) checked that the stain to numeric ID mapping you have there is true for every plate in every batch? Because those assignments can and do change.

gwaybio commented 2 years ago

I don't recall if anyone has checked for consistency! Thanks for noting here.

Would you say that the xml files is the source of truth?

bethac07 commented 2 years ago

Yes, the xml files are the source of truth

wongdanr commented 2 years ago

Sorry @bethac07 so does this channel mapping not apply to all LINCS images? ch01 - HOECHST 33342 DNA ch02 - Alexa 488 ER ch03 - 488 long RNA ch04 - Alexa 568 AGP ch05 - Alexa 647 Mito

bethac07 commented 2 years ago

That mapping may or may not, I don't know. I have no evidence that it does not, but I do know we have occasionally had cases where within batches (and certainly across them) the channel order changes between plates. It doesn't sound like anyone in anything documented has ever confirmed if this is consistent.

wongdanr commented 2 years ago

oh I see ok thanks for letting me know. Will those xml files you reference have the channel mappings per batch?

bethac07 commented 2 years ago

The xml files have the mappings, but as I stated, I would not trust them at a batch level, only at a plate level.

wongdanr commented 2 years ago

Thanks @bethac07, I just parsed through all of the xml files and confirmed that indeed the mapping that @gwaygenomics gave holds for all plates (at least the ones in s3://cellpainting-gallery/lincs/broad/images/2016_04_01_a549_48hr_batch1/images/ xml/). Thank you!

gwaybio commented 2 years ago

Wonderful! Thanks for following up with this @wongdanr

If possible, can you paste the code snippet you used to confirm? (It is likely to help future users who stumble upon this issue)

wongdanr commented 2 years ago

sure thing @gwaybio:

def testChannelOrderThroughLINCSXMLFiles():
    """
    Make sure that our lincs channel image assumption is correct: 
    {1: "HOECHST 33342", 2:"Alexa 488", 3:"488 long", 4:"Alexa 568", 5:"Alexa 647"}
    Parses through xml files in /home/wongd26/workspace/profiler/lincs_xml_files/xml/
    (aws s3 cp --recursive --no-sign-request s3://cellpainting-gallery/lincs/broad/images/2016_04_01_a549_48hr_batch1/images/ xml/ --exclude "*" --include "*.xml")
    """
    directory = "/home/wongd26/workspace/profiler/lincs_xml_files/xml/"
    channel_map = {1: "HOECHST 33342", 2:"Alexa 488", 3:"488 long", 4:"Alexa 568", 5:"Alexa 647"}
    for sub_dir in os.listdir(directory):
        print("subdir", sub_dir)
        with open(directory + sub_dir + "/Images/Index.idx.xml", "r") as xmlfile:
            lines = xmlfile.readlines()
            for line in lines:
                if "<ChannelID>" in line:
                    channel = int(line[line.find("<ChannelID>") + 11: line.rfind("<")])
                if "<ChannelName>" in line:
                    name = line[line.find("<ChannelName>") + 13: line.rfind("<")]
                    assert(channel_map[channel] == name)
gwaybio commented 2 years ago

awesome. thanks @wongdanr