WayScience / IDR_stream

Software for feature extraction from IDR image data
BSD 3-Clause "New" or "Revised" License
4 stars 2 forks source link

Fix cp dp differences #21

Closed roshankern closed 1 year ago

roshankern commented 1 year ago

This PR is ready for review! Two changes are introduced:

1) The CP environment file includes the same version of pytorch as the DP environment file.

2) idrstream_cp.py now just copies the metadata file into the images directory instead of first saving the metadata csv into a different folder and copying it from there. More explanation about this change is given in my second comment.

roshankern commented 1 year ago

This comment includes extended documentation for the second change in this PR (idrstream_cp.py now just copies the metadata file into the images directory instead of first saving the metadata csv into a different folder and copying it from there):

CellProfiler (CP) requires a metadata .csv file in the images directory (within the CP project directory, within tmp directory of IDR_stream run). To follow idrstream_dp.py convention, we want the user to load in a metadata file (with locations, perturbation data, etc) as a pandas dataframe and pass this metadata dataframe into the run_cp_stream() method so IDR_stream has an idea of where the cells we desire features from are located.

The current version of idrstream_cp.py requires a user to (see example_cp.ipynb): 1) Use the stream.convert_tsv_to_csv() method to convert the .tsv metadata file into a .csv file, which will be saved in a directory not accessed by CellProfiler. 2) Use stream.copy_CP_files() to copy the saved .csv from step 1 into the images directory accessed by CellProfiler. 3) Load in the data to process (metadata) .tsv file as a pandas dataframe and pass this dataframe into stream.run_cp_stream() (by idrstream_dp.py convention).

The newer version of idrstream_cp.py requires a user to (see new example_cp.ipynb): 1) Simply load in the data to process (metadata) .tsv file as a pandas dataframe and pass this dataframe into stream.run_cp_stream() (by idrstream_dp.py convention).

Now, in the background, idrstream_cp.py will save the metadata pandas dataframe as a .csv in the images directory by itself (a helper function init_cp_run(metadata_dataframe) is called at the beginning of stream.run_cp_stream() to accomplish this task).