cytomining / pycytominer

Python package for processing image-based profiling data
https://pycytominer.readthedocs.io
BSD 3-Clause "New" or "Revised" License
76 stars 34 forks source link

Collate() Function Directory Structure from CellProfiler - Experiment.csv not saved to subfolders #229

Open jenna-tomkinson opened 2 years ago

jenna-tomkinson commented 2 years ago

Hi @bethac07!

I have been testing the collate function for a project I am working on in the @gwaybio lab and I have come to a bit of a roadblock.

I have done some digging, and based on the appendix for the "Image-based Profiling Handbook", each of the Well-Site (e.g C6-2) subfolders are meant to have five .csv files, one of which is the Experiment.csv file.

After running my CellProfiler pipeline, I noticed that only one Experiment.csv file was created and was saved along side all of the subfolders.

After looking into CellProfiler documentation to change this structure, I found that the "ExporttToSpreadsheet" module states:

Do note that regardless of your choice, the Experiment.csv is saved to the Default Input/Output Folder and not to individual subfolders.

I also noticed that this issue was encountered before (see CellProfiler/CellProfiler#1110), and was already solved (see CellProfiler/CellProfiler#3914). However, I am still having trouble.

The only solution I see is to manually copy the Experiment.csv file into every Well-Site folder. This might be fine for my small pilot data set of 8 wells and 4 sites per well, but when I have more wells and sites, I am wondering if there is a sustainable way to reach the specific directory style for the collate function based on how CellProfiler exports files?

Any insight that you have to point me in the right direction would be greatly appreciated. Thank you in advance for your help with this!

Jenna

bethac07 commented 2 years ago

Hi Jenna,

Can you take a step back and talk to me a bit more about your use case? How are you hoping to cache/use this information?

Briefly, in the handbook, we're assuming you run each plate-well-site independently, which is why you always have an Experiment.csv; if that isn't the case, you're right that you'll have only one Experiment file per each "triggering of CellProfiler"

jenna-tomkinson commented 2 years ago

Hi Beth,

Attached to this comment is a .txt file of the CP analysis pipeline I used to extract features to help with any confusion: CellProfiler_Pipeline.txt

Firstly, the goal in mind for this project is to use pycytominer.collate to create a SQLite file.

Secondly, I think it would be best to describe the data that I am working with to understand a lot of where my confusion could be at. I am working with 96 images, with no batch ID (since only 1 batch), no plate ID (since only 1 plate), and 8 wells with 4 sites each. These are Cell Painting images, but I am only working with 3 channels (DAPI, GFP, RFP).

I am mainly trying to follow the steps given in the handbook from chapter 5 (see https://cytomining.github.io/profiling-handbook/05-create-profiles.html), since this is where the collate function is mentioned.

Can you elaborate more on what you mean by "we're assuming you run each plate-well-site independently"? From my experience with CellProfiler, all of the images from every well are inputted (into "Images" module) and ran all at the same time.

bethac07 commented 2 years ago

Thanks Jenna, that's helpful!

Firstly, the goal in mind for this project is to use pycytominer.collate to create a SQLite file.

Ok, well then all of this is probably unnecessary, because pycytominer utilizes cytominer-database to do the actual collation, and as far as I understand, it ignores Experiment.csv files. Is it actually giving you an error of some kind of not being able to find these? If so, that would be helpful for troubleshooting. I'm pretty sure the easiest way to solve this then would just be deleting the experiment line from the config, since it doesn't do anything.

Can you elaborate more on what you mean by "we're assuming you run each plate-well-site independently"? From my experience with CellProfiler, all of the images from every well are inputted (into "Images" module) and ran all at the same time.

So typically, there are two major use cases on how folks would be using CellProfiler; you're kind of trying to do a hybrid between them, which is why I think you're having a bit of trouble.

Does all of that help make more sense? If not, I think this video might help.

jenna-tomkinson commented 2 years ago

Hi Beth,

This makes much more sense! For the first part about the Experiment.csv files, I hadn't tried to run the function so I did not have an error. I was only attempting to have the same structure as the appendix showed at that time.

Currently, I am attempting to run the collate() function with everything set up with a workspace directory and I am getting an error saying it can not read my config file. This could be another issue entirely.

I am going to go with the first option you proposed and run the ExportToDatabase module instead to since my data set is so small and can be run locally.

Thank you so much for those clarifications!