Closed shntnu closed 10 months ago
I'm suggesting changes in folder naming to make the structure consistent across various batches in this project and they will reflect in the following transfers commands:
aws sync s3://imaging-platform/projects/2017_09_27_RareDiseases_Taipale/PILOT_1/ s3://cellpainting-gallery/cpg0026-lacoste_haghighi-rare-diseases/broad/images/PILOT_1/unprojected_images/
aws sync s3://imaging-platform/projects/2017_09_27_RareDiseases_Taipale/PILOT_1_maxproj/ s3://cellpainting-gallery/cpg0026-lacoste_haghighi-rare-diseases/broad/images/PILOT_1/images/
aws sync s3://imaging-platform/projects/2017_09_27_RareDiseases_Taipale/Cancer_Mutations_Screen/ s3://cellpainting-gallery/cpg0026-lacoste_haghighi-rare-diseases/broad/images/Cancer_Mutations_Screen/unprojected_images/
aws sync s3://imaging-platform/projects/2017_09_27_RareDiseases_Taipale/Maxproj_Cancer_Mutations_Screen/ s3://cellpainting-gallery/cpg0026-lacoste_haghighi-rare-diseases/broad/images/Cancer_Mutations_Screen/images/
aws sync s3://imaging-platform/projects/2017_09_27_RareDiseases_Taipale/Common_Variants/ s3://cellpainting-gallery/cpg0026-lacoste_haghighi-rare-diseases/broad/images/Common_Variants/unprojected_images/
aws sync s3://imaging-platform/projects/2017_09_27_RareDiseases_Taipale/Maxproj_Common_Variants/ s3://cellpainting-gallery/cpg0026-lacoste_haghighi-rare-diseases/broad/images/Common_Variants/images/
aws sync s3://imaging-platform/projects/2017_09_27_RareDiseases_Taipale/Kinase_Plates/ s3://cellpainting-gallery/cpg0026-lacoste_haghighi-rare-diseases/broad/images/Kinase_Plates/unprojected_images/
aws sync s3://imaging-platform/projects/2017_09_27_RareDiseases_Taipale/Maxproj_Kinase_Plates/ s3://cellpainting-gallery/cpg0026-lacoste_haghighi-rare-diseases/broad/images/Kinase_Plates/images/
aws sync s3://imaging-platform/projects/2017_09_27_RareDiseases_Taipale/Replicates_Original_Screen/ s3://cellpainting-gallery/cpg0026-lacoste_haghighi-rare-diseases/broad/images/Replicates_Original_Screen/unprojected_images/
aws sync s3://imaging-platform/projects/2017_09_27_RareDiseases_Taipale/Maxproj_Replicates_Original_Screen/ s3://cellpainting-gallery/cpg0026-lacoste_haghighi-rare-diseases/broad/images/Replicates_Original_Screen/images/
aws sync s3://imaging-platform/projects/2017_10_19_Profiling_rare_ORFs/2021_05_21_QualityControlPathwayArrayedScreen/ s3://cellpainting-gallery/cpg0026-lacoste_haghighi-rare-diseases/broad/images/2021_05_21_QualityControlPathwayArrayedScreen/
aws sync s3://imaging-platform/projects/2017_10_19_Profiling_rare_ORFs/2021_05_21_QualityControlPathwayArrayedScreen/ s3://cellpainting-gallery/cpg0026-lacoste_haghighi-rare-diseases/broad/images/2021_05_21_QualityControlPathwayArrayedScreen/
aws sync s3://imaging-platform/projects/2017_10_19_Profiling_rare_ORFs/2021_05_21_QualityControlPathwayArrayedScreen/ s3://cellpainting-gallery/cpg0026-lacoste_haghighi-rare-diseases/broad/images/2021_05_21_QualityControlPathwayArrayedScreen/
aws sync s3://imaging-platform/projects/2017_10_19_Profiling_rare_ORFs/2022_01_12_Batch1/ s3://cellpainting-gallery/cpg0026-lacoste_haghighi-rare-diseases/broad/images/2022_01_12_Batch1/
aws sync s3://imaging-platform/projects/2017_10_19_Profiling_rare_ORFs/2022_01_12_Batch2/ s3://cellpainting-gallery/cpg0026-lacoste_haghighi-rare-diseases/broad/images/2022_01_12_Batch2/
import subprocess
source_prefix = "s3://imaging-platform/projects/"
destination_prefix = "s3://cellpainting-gallery/cpg0026-lacoste_haghighi-rare-diseases/broad/images/"
paths = {
"2017_09_27_RareDiseases_Taipale/PILOT_1/": "PILOT_1/unprojected_images/",
"2017_09_27_RareDiseases_Taipale/PILOT_1_maxproj/": "PILOT_1/images/",
"2017_09_27_RareDiseases_Taipale/Cancer_Mutations_Screen/": "Cancer_Mutations_Screen/unprojected_images/",
"2017_09_27_RareDiseases_Taipale/Maxproj_Cancer_Mutations_Screen/": "Cancer_Mutations_Screen/images/",
"2017_09_27_RareDiseases_Taipale/Common_Variants/": "Common_Variants/unprojected_images/",
"2017_09_27_RareDiseases_Taipale/Maxproj_Common_Variants/": "Common_Variants/images/",
"2017_09_27_RareDiseases_Taipale/Kinase_Plates/": "Kinase_Plates/unprojected_images/",
"2017_09_27_RareDiseases_Taipale/Maxproj_Kinase_Plates/": "Kinase_Plates/images/",
"2017_09_27_RareDiseases_Taipale/Replicates_Original_Screen/": "Replicates_Original_Screen/unprojected_images/",
"2017_09_27_RareDiseases_Taipale/Maxproj_Replicates_Original_Screen/": "Replicates_Original_Screen/images/",
"2017_10_19_Profiling_rare_ORFs/2021_05_21_QualityControlPathwayArrayedScreen/": "2021_05_21_QualityControlPathwayArrayedScreen/",
"2017_10_19_Profiling_rare_ORFs/2022_01_12_Batch1/": "2022_01_12_Batch1/",
"2017_10_19_Profiling_rare_ORFs/2022_01_12_Batch2/": "2022_01_12_Batch2/"
}
for source_suffix, destination_suffix in paths.items():
source = source_prefix + source_suffix
destination = destination_prefix + destination_suffix
subprocess.run(["aws", "sync", source, destination])
@MarziehHaghighi Would you be able to do a spot check to verify that the code above matches what you had?
Looks correct to me! Thanks!
Hi @shntnu Shantanu, If granting the access to do the transfer is not possible, could you please transfer the analysis folder of the above batches to the gallery as well? Thanks
@MarziehHaghighi First, one q about the folder structure.
Do you mean to handle structuring the unprotected images differently?
For example, consider the Cancer_Mutations_Screen
batch of images:
My guess is you'd want this to be transferred like this:
paths = {
"2017_09_27_RareDiseases_Taipale/Cancer_Mutations_Screen/images/": "Cancer_Mutations_Screen/images_unprojected/",
"2017_09_27_RareDiseases_Taipale/Maxproj_Cancer_Mutations_Screen/images/": "Cancer_Mutations_Screen/images/",
"2017_09_27_RareDiseases_Taipale/Maxproj_Cancer_Mutations_Screen/illum/": "Cancer_Mutations_Screen/illum/",
]
This is different from how you have it.
If my version is correct, then this is the updated script:
import subprocess
source_prefix = "s3://imaging-platform/projects/"
destination_prefix = (
"s3://cellpainting-gallery/cpg0026-lacoste_haghighi-rare-diseases/broad/images/"
)
paths = {
# unprojected images: copy the `images` to `images_unprojected`
"2017_09_27_RareDiseases_Taipale/PILOT_1/images/": "PILOT_1/images_unprojected/",
"2017_09_27_RareDiseases_Taipale/Cancer_Mutations_Screen/images/": "Cancer_Mutations_Screen/images_unprojected/",
"2017_09_27_RareDiseases_Taipale/Common_Variants/images/": "Common_Variants/images_unprojected/",
"2017_09_27_RareDiseases_Taipale/Kinase_Plates/images/": "Kinase_Plates/images_unprojected/",
"2017_09_27_RareDiseases_Taipale/Replicates_Original_Screen/images/": "Replicates_Original_Screen/images_unprojected/",
# projected images: copy the `images` to `images` and `illum` to `illum`
# (which is the same as just copying the whole batch folder)
"2017_09_27_RareDiseases_Taipale/PILOT_1_maxproj/": "PILOT_1/",
"2017_09_27_RareDiseases_Taipale/Maxproj_Cancer_Mutations_Screen/": "Cancer_Mutations_Screen/",
"2017_09_27_RareDiseases_Taipale/Maxproj_Common_Variants/": "Common_Variants/",
"2017_09_27_RareDiseases_Taipale/Maxproj_Kinase_Plates/": "Kinase_Plates/",
"2017_09_27_RareDiseases_Taipale/Maxproj_Replicates_Original_Screen/": "Replicates_Original_Screen/",
# these batches already have folders organized as `images`, `images_unprojected`, and `illum`, so we can just sync the whole batch folder
"2017_10_19_Profiling_rare_ORFs/2021_05_21_QualityControlPathwayArrayedScreen/": "2021_05_21_QualityControlPathwayArrayedScreen/",
"2017_10_19_Profiling_rare_ORFs/2022_01_12_Batch1/": "2022_01_12_Batch1/",
"2017_10_19_Profiling_rare_ORFs/2022_01_12_Batch2/": "2022_01_12_Batch2/",
}
for source_suffix, destination_suffix in paths.items():
source = source_prefix + source_suffix
destination = destination_prefix + destination_suffix
subprocess.run(["aws", "s3", "sync", source, destination])
@shntnu sorry I'm a bit confused but basically I want to have a consistent structure for various batches in the gallery although the structure for different batches is not consistent in the imaging bucket. I wrote these commands so that we have the below final structure for all batches in the gallery. Maybe my visual inspection of your commands was not perfect for making sure it follows my structure but my initial version should be accurate.
cellpainting-gallery
└── cpg0026-lacoste_haghighi-rare-diseases
└── broad
├── images
│ ├── PILOT_1
│ │ ├── illum
│ │ ├── unprojected_images
│ │ └── images
│ ├── Cancer_Mutations_Screen
│ ├── Common_Variants
│ ├── Kinase_Plates
│ ├── Replicates_Original_Screen
│ ├── 2021_05_21_QualityControlPathwayArrayedScreen
│ ├── 2022_01_12_Batch1
│ └── 2022_01_12_Batch2
└── workspace
├── analysis
├── backend
├── load_data_csv
├── metadata
└── profiles
Ah ok -- so we are on the same page but your commands would result in a discrepancy, which I've addressed. We can discuss this when we chat. Most of the image
transfers were completed successfully (only the files below will need to be unarchived).
s3://imaging-platform/projects/2017_09_27_RareDiseases_Taipale/PILOT_1_maxproj/images/RC4_IF_23/r06c09f03p01-ch2sk1fk1fl1.tiff
s3://imaging-platform/projects/2017_10_19_Profiling_rare_ORFs/2021_05_21_QualityControlPathwayArrayedScreen/images/Plate1D/r05c05f04p03-ch3sk1fk1fl1.tiff
s3://imaging-platform/projects/2017_10_19_Profiling_rare_ORFs/2021_05_21_QualityControlPathwayArrayedScreen/images/Plate1D/r05c06f01p03-ch2sk1fk1fl1.tiff
s3://imaging-platform/projects/2017_10_19_Profiling_rare_ORFs/2021_05_21_QualityControlPathwayArrayedScreen/images/Plate3E/r05c08f16p03-ch1sk1fk1fl1.tiff
s3://imaging-platform/projects/2017_10_19_Profiling_rare_ORFs/2021_05_21_QualityControlPathwayArrayedScreen/unprojected_images/Plate2D2_0305_Plate2D2__2021-03-05T11_42_57-Measurement1/Images/r04c11f08p02-ch1sk1fk1fl1.tiff
s3://imaging-platform/projects/2017_10_19_Profiling_rare_ORFs/2021_05_21_QualityControlPathwayArrayedScreen/unprojected_images/Plate2D2_0305_Plate2D2__2021-03-05T11_42_57-Measurement1/Images/r04c11f09p02-ch2sk1fk1fl1.tiff
@MarziehHaghighi I confirm that all the images, except the 6 that need to be restored, are copied over successfully. The restore for the remaining is underway
parallel \
aws s3api \
restore-object \
--bucket imaging-platform \
--key {} \
--restore-request GlacierJobParameters={"Tier"="Standard"} ::: \
projects/2017_09_27_RareDiseases_Taipale/PILOT_1_maxproj/images/RC4_IF_23/r06c09f03p01-ch2sk1fk1fl1.tiff projects/2017_10_19_Profiling_rare_ORFs/2021_05_21_QualityControlPathwayArrayedScreen/images/Plate1D/r05c05f04p03-ch3sk1fk1fl1.tiff projects/2017_10_19_Profiling_rare_ORFs/2021_05_21_QualityControlPathwayArrayedScreen/images/Plate1D/r05c06f01p03-ch2sk1fk1fl1.tiff projects/2017_10_19_Profiling_rare_ORFs/2021_05_21_QualityControlPathwayArrayedScreen/images/Plate3E/r05c08f16p03-ch1sk1fk1fl1.tiff projects/2017_10_19_Profiling_rare_ORFs/2021_05_21_QualityControlPathwayArrayedScreen/unprojected_images/Plate2D2_0305_Plate2D2__2021-03-05T11_42_57-Measurement1/Images/r04c11f08p02-ch1sk1fk1fl1.tiff projects/2017_10_19_Profiling_rare_ORFs/2021_05_21_QualityControlPathwayArrayedScreen/unprojected_images/Plate2D2_0305_Plate2D2__2021-03-05T11_42_57-Measurement1/Images/r04c11f09p02-ch2sk1fk1fl1.tiff
You can proceed with the analysis
folders (and whatever else you plan to copy over from workspace
) whenever you are ready
The restore for the remaining is underway
@MarziehHaghighi This is all set 🎉
Script
source_prefix = "s3://imaging-platform/projects/"
destination_prefix = (
"s3://cellpainting-gallery/cpg0026-lacoste_haghighi-rare-diseases/broad/images/"
)
paths = {
# unprojected images: copy the `images` to `images_unprojected`
"2017_09_27_RareDiseases_Taipale/PILOT_1/images/": "PILOT_1/images_unprojected/",
"2017_09_27_RareDiseases_Taipale/Cancer_Mutations_Screen/images/": "Cancer_Mutations_Screen/images_unprojected/",
"2017_09_27_RareDiseases_Taipale/Common_Variants/images/": "Common_Variants/images_unprojected/",
"2017_09_27_RareDiseases_Taipale/Kinase_Plates/images/": "Kinase_Plates/images_unprojected/",
"2017_09_27_RareDiseases_Taipale/Replicates_Original_Screen/images/": "Replicates_Original_Screen/images_unprojected/",
# projected images: copy the `images` to `images` and `illum` to `illum`
# (which is the same as just copying the whole batch folder)
"2017_09_27_RareDiseases_Taipale/PILOT_1_maxproj/": "PILOT_1/",
"2017_09_27_RareDiseases_Taipale/Maxproj_Cancer_Mutations_Screen/": "Cancer_Mutations_Screen/",
"2017_09_27_RareDiseases_Taipale/Maxproj_Common_Variants/": "Common_Variants/",
"2017_09_27_RareDiseases_Taipale/Maxproj_Kinase_Plates/": "Kinase_Plates/",
"2017_09_27_RareDiseases_Taipale/Maxproj_Replicates_Original_Screen/": "Replicates_Original_Screen/",
# these batches already have folders organized as `images`, `images_unprojected`, and `illum`, so we can just sync the whole batch folder
"2017_10_19_Profiling_rare_ORFs/2021_05_21_QualityControlPathwayArrayedScreen/": "2021_05_21_QualityControlPathwayArrayedScreen/",
"2017_10_19_Profiling_rare_ORFs/2022_01_12_Batch1/": "2022_01_12_Batch1/",
"2017_10_19_Profiling_rare_ORFs/2022_01_12_Batch2/": "2022_01_12_Batch2/",
}
import subprocess
from concurrent.futures import ThreadPoolExecutor
def sync(source_suffix, destination_suffix):
source = source_prefix + source_suffix
destination = destination_prefix + destination_suffix
# subprocess.run(["aws", "s3", "sync", "--quiet", source, destination])
subprocess.run(["aws", "s3", "sync", source, destination])
with ThreadPoolExecutor(max_workers=8) as executor:
futures = {executor.submit(sync, source_suffix, destination_suffix) for source_suffix, destination_suffix in paths.items()}
Analysis folders transfer is done. I update the same comment here when I'm done with the profiles and results transfers. Checklist of the folders to transfer:
source_prefix = "s3://imaging-platform/projects/2017_09_27_RareDiseases_Taipale/workspace/analysis/"
destination_prefix = (
"s3://cellpainting-gallery/cpg0026-lacoste_haghighi-rare-diseases/broad/workspace/analysis/"
)
paths = {
"PILOT_1_maxproj/": "PILOT_1/",
"Maxproj_Replicates_Original_Screen/": "Replicates_Original_Screen/",
"Maxproj_Kinase_Plates/": "Kinase_Plates/",
"Maxproj_Common_Variants/": "Common_Variants/",
"Maxproj_Cancer_Mutations_Screen/": "Cancer_Mutations_Screen/",
}
import subprocess
from concurrent.futures import ThreadPoolExecutor
def sync(source_suffix, destination_suffix):
source = source_prefix + source_suffix
destination = destination_prefix + destination_suffix
# subprocess.run(["aws", "s3", "sync", "--quiet", source, destination])
subprocess.run(["aws", "s3", "sync", source, destination])
with ThreadPoolExecutor(max_workers=8) as executor:
futures = {executor.submit(sync, source_suffix, destination_suffix) for source_suffix, destination_suffix in paths.items()}
Segmentation/ Feature extraction is being performed by Cimini lab
Profile creation is being performed by (Cimini lab / Carpenter-Singh lab)
Data can be public in RODA Immediately
Update as generated:
[Link to profile repo]
[Link to publication repo]
cpg0026-lacoste_haghighi-rare-diseases
Transfer to CellPainting Gallery:
If data is being published, prepare for publication:
Once published: