broadinstitute / cellpainting-gallery

Cell Painting Gallery
https://broadinstitute.github.io/cellpainting-gallery/
MIT License
51 stars 8 forks source link

2023_04_14_Lacoste-Haghighi #44

Closed shntnu closed 10 months ago

shntnu commented 1 year ago

Segmentation/ Feature extraction is being performed by Cimini lab
Profile creation is being performed by (Cimini lab / Carpenter-Singh lab)
Data can be public in RODA Immediately

Update as generated:
[Link to profile repo]
[Link to publication repo]
cpg0026-lacoste_haghighi-rare-diseases

Transfer to CellPainting Gallery:

If data is being published, prepare for publication:

Once published:

MarziehHaghighi commented 1 year ago

I'm suggesting changes in folder naming to make the structure consistent across various batches in this project and they will reflect in the following transfers commands:

Transferring image folders:

aws sync s3://imaging-platform/projects/2017_09_27_RareDiseases_Taipale/PILOT_1/ s3://cellpainting-gallery/cpg0026-lacoste_haghighi-rare-diseases/broad/images/PILOT_1/unprojected_images/

aws sync s3://imaging-platform/projects/2017_09_27_RareDiseases_Taipale/PILOT_1_maxproj/ s3://cellpainting-gallery/cpg0026-lacoste_haghighi-rare-diseases/broad/images/PILOT_1/images/

aws sync s3://imaging-platform/projects/2017_09_27_RareDiseases_Taipale/Cancer_Mutations_Screen/ s3://cellpainting-gallery/cpg0026-lacoste_haghighi-rare-diseases/broad/images/Cancer_Mutations_Screen/unprojected_images/

aws sync s3://imaging-platform/projects/2017_09_27_RareDiseases_Taipale/Maxproj_Cancer_Mutations_Screen/ s3://cellpainting-gallery/cpg0026-lacoste_haghighi-rare-diseases/broad/images/Cancer_Mutations_Screen/images/

aws sync s3://imaging-platform/projects/2017_09_27_RareDiseases_Taipale/Common_Variants/ s3://cellpainting-gallery/cpg0026-lacoste_haghighi-rare-diseases/broad/images/Common_Variants/unprojected_images/

aws sync s3://imaging-platform/projects/2017_09_27_RareDiseases_Taipale/Maxproj_Common_Variants/ s3://cellpainting-gallery/cpg0026-lacoste_haghighi-rare-diseases/broad/images/Common_Variants/images/

aws sync s3://imaging-platform/projects/2017_09_27_RareDiseases_Taipale/Kinase_Plates/ s3://cellpainting-gallery/cpg0026-lacoste_haghighi-rare-diseases/broad/images/Kinase_Plates/unprojected_images/

aws sync s3://imaging-platform/projects/2017_09_27_RareDiseases_Taipale/Maxproj_Kinase_Plates/ s3://cellpainting-gallery/cpg0026-lacoste_haghighi-rare-diseases/broad/images/Kinase_Plates/images/

aws sync s3://imaging-platform/projects/2017_09_27_RareDiseases_Taipale/Replicates_Original_Screen/ s3://cellpainting-gallery/cpg0026-lacoste_haghighi-rare-diseases/broad/images/Replicates_Original_Screen/unprojected_images/

aws sync s3://imaging-platform/projects/2017_09_27_RareDiseases_Taipale/Maxproj_Replicates_Original_Screen/ s3://cellpainting-gallery/cpg0026-lacoste_haghighi-rare-diseases/broad/images/Replicates_Original_Screen/images/

aws sync s3://imaging-platform/projects/2017_10_19_Profiling_rare_ORFs/2021_05_21_QualityControlPathwayArrayedScreen/ s3://cellpainting-gallery/cpg0026-lacoste_haghighi-rare-diseases/broad/images/2021_05_21_QualityControlPathwayArrayedScreen/

aws sync s3://imaging-platform/projects/2017_10_19_Profiling_rare_ORFs/2021_05_21_QualityControlPathwayArrayedScreen/ s3://cellpainting-gallery/cpg0026-lacoste_haghighi-rare-diseases/broad/images/2021_05_21_QualityControlPathwayArrayedScreen/

aws sync s3://imaging-platform/projects/2017_10_19_Profiling_rare_ORFs/2021_05_21_QualityControlPathwayArrayedScreen/ s3://cellpainting-gallery/cpg0026-lacoste_haghighi-rare-diseases/broad/images/2021_05_21_QualityControlPathwayArrayedScreen/

aws sync s3://imaging-platform/projects/2017_10_19_Profiling_rare_ORFs/2022_01_12_Batch1/ s3://cellpainting-gallery/cpg0026-lacoste_haghighi-rare-diseases/broad/images/2022_01_12_Batch1/

aws sync s3://imaging-platform/projects/2017_10_19_Profiling_rare_ORFs/2022_01_12_Batch2/ s3://cellpainting-gallery/cpg0026-lacoste_haghighi-rare-diseases/broad/images/2022_01_12_Batch2/
shntnu commented 1 year ago
import subprocess

source_prefix = "s3://imaging-platform/projects/"
destination_prefix = "s3://cellpainting-gallery/cpg0026-lacoste_haghighi-rare-diseases/broad/images/"

paths = {
    "2017_09_27_RareDiseases_Taipale/PILOT_1/": "PILOT_1/unprojected_images/",
    "2017_09_27_RareDiseases_Taipale/PILOT_1_maxproj/": "PILOT_1/images/",
    "2017_09_27_RareDiseases_Taipale/Cancer_Mutations_Screen/": "Cancer_Mutations_Screen/unprojected_images/",
    "2017_09_27_RareDiseases_Taipale/Maxproj_Cancer_Mutations_Screen/": "Cancer_Mutations_Screen/images/",
    "2017_09_27_RareDiseases_Taipale/Common_Variants/": "Common_Variants/unprojected_images/",
    "2017_09_27_RareDiseases_Taipale/Maxproj_Common_Variants/": "Common_Variants/images/",
    "2017_09_27_RareDiseases_Taipale/Kinase_Plates/": "Kinase_Plates/unprojected_images/",
    "2017_09_27_RareDiseases_Taipale/Maxproj_Kinase_Plates/": "Kinase_Plates/images/",
    "2017_09_27_RareDiseases_Taipale/Replicates_Original_Screen/": "Replicates_Original_Screen/unprojected_images/",
    "2017_09_27_RareDiseases_Taipale/Maxproj_Replicates_Original_Screen/": "Replicates_Original_Screen/images/",
    "2017_10_19_Profiling_rare_ORFs/2021_05_21_QualityControlPathwayArrayedScreen/": "2021_05_21_QualityControlPathwayArrayedScreen/",
    "2017_10_19_Profiling_rare_ORFs/2022_01_12_Batch1/": "2022_01_12_Batch1/",
    "2017_10_19_Profiling_rare_ORFs/2022_01_12_Batch2/": "2022_01_12_Batch2/"
}

for source_suffix, destination_suffix in paths.items():
    source = source_prefix + source_suffix
    destination = destination_prefix + destination_suffix
    subprocess.run(["aws", "sync", source, destination])
shntnu commented 1 year ago

@MarziehHaghighi Would you be able to do a spot check to verify that the code above matches what you had?

MarziehHaghighi commented 1 year ago

Looks correct to me! Thanks!

MarziehHaghighi commented 1 year ago

Hi @shntnu Shantanu, If granting the access to do the transfer is not possible, could you please transfer the analysis folder of the above batches to the gallery as well? Thanks

shntnu commented 1 year ago

@MarziehHaghighi First, one q about the folder structure.

Do you mean to handle structuring the unprotected images differently?

For example, consider the Cancer_Mutations_Screen batch of images:

image

My guess is you'd want this to be transferred like this:

paths = {
    "2017_09_27_RareDiseases_Taipale/Cancer_Mutations_Screen/images/": "Cancer_Mutations_Screen/images_unprojected/",
    "2017_09_27_RareDiseases_Taipale/Maxproj_Cancer_Mutations_Screen/images/": "Cancer_Mutations_Screen/images/",
    "2017_09_27_RareDiseases_Taipale/Maxproj_Cancer_Mutations_Screen/illum/": "Cancer_Mutations_Screen/illum/",
]

This is different from how you have it.

If my version is correct, then this is the updated script:

import subprocess

source_prefix = "s3://imaging-platform/projects/"
destination_prefix = (
    "s3://cellpainting-gallery/cpg0026-lacoste_haghighi-rare-diseases/broad/images/"
)

paths = {
    # unprojected images: copy the `images` to `images_unprojected`
    "2017_09_27_RareDiseases_Taipale/PILOT_1/images/": "PILOT_1/images_unprojected/",
    "2017_09_27_RareDiseases_Taipale/Cancer_Mutations_Screen/images/": "Cancer_Mutations_Screen/images_unprojected/",
    "2017_09_27_RareDiseases_Taipale/Common_Variants/images/": "Common_Variants/images_unprojected/",
    "2017_09_27_RareDiseases_Taipale/Kinase_Plates/images/": "Kinase_Plates/images_unprojected/",
    "2017_09_27_RareDiseases_Taipale/Replicates_Original_Screen/images/": "Replicates_Original_Screen/images_unprojected/",
    # projected images: copy the `images` to `images` and `illum` to `illum`
    # (which is the same as just copying the whole batch folder)
    "2017_09_27_RareDiseases_Taipale/PILOT_1_maxproj/": "PILOT_1/",
    "2017_09_27_RareDiseases_Taipale/Maxproj_Cancer_Mutations_Screen/": "Cancer_Mutations_Screen/",
    "2017_09_27_RareDiseases_Taipale/Maxproj_Common_Variants/": "Common_Variants/",
    "2017_09_27_RareDiseases_Taipale/Maxproj_Kinase_Plates/": "Kinase_Plates/",
    "2017_09_27_RareDiseases_Taipale/Maxproj_Replicates_Original_Screen/": "Replicates_Original_Screen/",
    # these batches already have folders organized as `images`, `images_unprojected`, and `illum`, so we can just sync the whole batch folder
    "2017_10_19_Profiling_rare_ORFs/2021_05_21_QualityControlPathwayArrayedScreen/": "2021_05_21_QualityControlPathwayArrayedScreen/",
    "2017_10_19_Profiling_rare_ORFs/2022_01_12_Batch1/": "2022_01_12_Batch1/",
    "2017_10_19_Profiling_rare_ORFs/2022_01_12_Batch2/": "2022_01_12_Batch2/",
}

for source_suffix, destination_suffix in paths.items():
    source = source_prefix + source_suffix
    destination = destination_prefix + destination_suffix
    subprocess.run(["aws", "s3", "sync", source, destination])
MarziehHaghighi commented 1 year ago

@shntnu sorry I'm a bit confused but basically I want to have a consistent structure for various batches in the gallery although the structure for different batches is not consistent in the imaging bucket. I wrote these commands so that we have the below final structure for all batches in the gallery. Maybe my visual inspection of your commands was not perfect for making sure it follows my structure but my initial version should be accurate.

cellpainting-gallery
└── cpg0026-lacoste_haghighi-rare-diseases
    └── broad
        ├── images
        │   ├── PILOT_1
        │   │   ├── illum
        │   │   ├── unprojected_images
        │   │   └── images
        │   ├── Cancer_Mutations_Screen 
        │   ├── Common_Variants
        │   ├── Kinase_Plates
        │   ├── Replicates_Original_Screen
        │   ├── 2021_05_21_QualityControlPathwayArrayedScreen 
        │   ├── 2022_01_12_Batch1     
        │   └── 2022_01_12_Batch2
        └── workspace
            ├── analysis
            ├── backend
            ├── load_data_csv
            ├── metadata
            └── profiles
shntnu commented 1 year ago

Ah ok -- so we are on the same page but your commands would result in a discrepancy, which I've addressed. We can discuss this when we chat. Most of the image transfers were completed successfully (only the files below will need to be unarchived).

s3://imaging-platform/projects/2017_09_27_RareDiseases_Taipale/PILOT_1_maxproj/images/RC4_IF_23/r06c09f03p01-ch2sk1fk1fl1.tiff
s3://imaging-platform/projects/2017_10_19_Profiling_rare_ORFs/2021_05_21_QualityControlPathwayArrayedScreen/images/Plate1D/r05c05f04p03-ch3sk1fk1fl1.tiff
s3://imaging-platform/projects/2017_10_19_Profiling_rare_ORFs/2021_05_21_QualityControlPathwayArrayedScreen/images/Plate1D/r05c06f01p03-ch2sk1fk1fl1.tiff
s3://imaging-platform/projects/2017_10_19_Profiling_rare_ORFs/2021_05_21_QualityControlPathwayArrayedScreen/images/Plate3E/r05c08f16p03-ch1sk1fk1fl1.tiff
s3://imaging-platform/projects/2017_10_19_Profiling_rare_ORFs/2021_05_21_QualityControlPathwayArrayedScreen/unprojected_images/Plate2D2_0305_Plate2D2__2021-03-05T11_42_57-Measurement1/Images/r04c11f08p02-ch1sk1fk1fl1.tiff
s3://imaging-platform/projects/2017_10_19_Profiling_rare_ORFs/2021_05_21_QualityControlPathwayArrayedScreen/unprojected_images/Plate2D2_0305_Plate2D2__2021-03-05T11_42_57-Measurement1/Images/r04c11f09p02-ch2sk1fk1fl1.tiff
shntnu commented 1 year ago

@MarziehHaghighi I confirm that all the images, except the 6 that need to be restored, are copied over successfully. The restore for the remaining is underway

parallel \
  aws s3api \
  restore-object \
  --bucket imaging-platform \
  --key {} \
  --restore-request GlacierJobParameters={"Tier"="Standard"} ::: \
  projects/2017_09_27_RareDiseases_Taipale/PILOT_1_maxproj/images/RC4_IF_23/r06c09f03p01-ch2sk1fk1fl1.tiff projects/2017_10_19_Profiling_rare_ORFs/2021_05_21_QualityControlPathwayArrayedScreen/images/Plate1D/r05c05f04p03-ch3sk1fk1fl1.tiff projects/2017_10_19_Profiling_rare_ORFs/2021_05_21_QualityControlPathwayArrayedScreen/images/Plate1D/r05c06f01p03-ch2sk1fk1fl1.tiff projects/2017_10_19_Profiling_rare_ORFs/2021_05_21_QualityControlPathwayArrayedScreen/images/Plate3E/r05c08f16p03-ch1sk1fk1fl1.tiff projects/2017_10_19_Profiling_rare_ORFs/2021_05_21_QualityControlPathwayArrayedScreen/unprojected_images/Plate2D2_0305_Plate2D2__2021-03-05T11_42_57-Measurement1/Images/r04c11f08p02-ch1sk1fk1fl1.tiff projects/2017_10_19_Profiling_rare_ORFs/2021_05_21_QualityControlPathwayArrayedScreen/unprojected_images/Plate2D2_0305_Plate2D2__2021-03-05T11_42_57-Measurement1/Images/r04c11f09p02-ch2sk1fk1fl1.tiff 

You can proceed with the analysis folders (and whatever else you plan to copy over from workspace) whenever you are ready

shntnu commented 1 year ago

The restore for the remaining is underway

@MarziehHaghighi This is all set 🎉

Script

source_prefix = "s3://imaging-platform/projects/"
destination_prefix = (
    "s3://cellpainting-gallery/cpg0026-lacoste_haghighi-rare-diseases/broad/images/"
)

paths = {
    # unprojected images: copy the `images` to `images_unprojected`
    "2017_09_27_RareDiseases_Taipale/PILOT_1/images/": "PILOT_1/images_unprojected/",
    "2017_09_27_RareDiseases_Taipale/Cancer_Mutations_Screen/images/": "Cancer_Mutations_Screen/images_unprojected/",
    "2017_09_27_RareDiseases_Taipale/Common_Variants/images/": "Common_Variants/images_unprojected/",
    "2017_09_27_RareDiseases_Taipale/Kinase_Plates/images/": "Kinase_Plates/images_unprojected/",
    "2017_09_27_RareDiseases_Taipale/Replicates_Original_Screen/images/": "Replicates_Original_Screen/images_unprojected/",
    # projected images: copy the `images` to `images` and `illum` to `illum`
    # (which is the same as just copying the whole batch folder)
    "2017_09_27_RareDiseases_Taipale/PILOT_1_maxproj/": "PILOT_1/",
    "2017_09_27_RareDiseases_Taipale/Maxproj_Cancer_Mutations_Screen/": "Cancer_Mutations_Screen/",
    "2017_09_27_RareDiseases_Taipale/Maxproj_Common_Variants/": "Common_Variants/",
    "2017_09_27_RareDiseases_Taipale/Maxproj_Kinase_Plates/": "Kinase_Plates/",
    "2017_09_27_RareDiseases_Taipale/Maxproj_Replicates_Original_Screen/": "Replicates_Original_Screen/",
    # these batches already have folders organized as `images`, `images_unprojected`, and `illum`, so we can just sync the whole batch folder
    "2017_10_19_Profiling_rare_ORFs/2021_05_21_QualityControlPathwayArrayedScreen/": "2021_05_21_QualityControlPathwayArrayedScreen/",
    "2017_10_19_Profiling_rare_ORFs/2022_01_12_Batch1/": "2022_01_12_Batch1/",
    "2017_10_19_Profiling_rare_ORFs/2022_01_12_Batch2/": "2022_01_12_Batch2/",
}

import subprocess
from concurrent.futures import ThreadPoolExecutor

def sync(source_suffix, destination_suffix):
    source = source_prefix + source_suffix
    destination = destination_prefix + destination_suffix
    # subprocess.run(["aws", "s3", "sync", "--quiet", source, destination])
    subprocess.run(["aws", "s3", "sync", source, destination])

with ThreadPoolExecutor(max_workers=8) as executor:
    futures = {executor.submit(sync, source_suffix, destination_suffix) for source_suffix, destination_suffix in paths.items()}
MarziehHaghighi commented 1 year ago

Analysis folders transfer is done. I update the same comment here when I'm done with the profiles and results transfers. Checklist of the folders to transfer:

source_prefix = "s3://imaging-platform/projects/2017_09_27_RareDiseases_Taipale/workspace/analysis/"
destination_prefix = (
    "s3://cellpainting-gallery/cpg0026-lacoste_haghighi-rare-diseases/broad/workspace/analysis/"
)

paths = {
    "PILOT_1_maxproj/": "PILOT_1/",
    "Maxproj_Replicates_Original_Screen/": "Replicates_Original_Screen/",
    "Maxproj_Kinase_Plates/": "Kinase_Plates/",
    "Maxproj_Common_Variants/": "Common_Variants/",
    "Maxproj_Cancer_Mutations_Screen/": "Cancer_Mutations_Screen/",
}

import subprocess
from concurrent.futures import ThreadPoolExecutor

def sync(source_suffix, destination_suffix):
    source = source_prefix + source_suffix
    destination = destination_prefix + destination_suffix
    # subprocess.run(["aws", "s3", "sync", "--quiet", source, destination])
    subprocess.run(["aws", "s3", "sync", source, destination])

with ThreadPoolExecutor(max_workers=8) as executor:
    futures = {executor.submit(sync, source_suffix, destination_suffix) for source_suffix, destination_suffix in paths.items()}