bunnech / cellot

Learning Single-Cell Perturbation Responses using Neural Optimal Transport
BSD 3-Clause "New" or "Revised" License
109 stars 9 forks source link

4i dataset #21

Closed Ralmohsen closed 4 months ago

Ralmohsen commented 4 months ago

After reading 4i data the preprocessed version provided

Screenshot1 Screenshot2 Screenshot3

Screenshot4 . I have not figured out the source/target distribution there. I notice that the data are indexed by the drug and cell original, but no source/target labeled.

Please see attached screenshots. Screenshot1: my code snippet Screenshot2 Screenshot3: is the data obs and var, as you can see it is indexed by drug as row and cell original as column. Screenshot4: is UMAP filtering the data by Trametinib but could not filter (source vs target)

I also found in the repository line 71 to line 93: https://github.com/bunnech/cellot/blob/main/cellot/data/cell.py you where labeling the data as source and target, I am not sure how do you do that. I thought the data are already labeled.

I really appreciate any explanation. Thank you

stefangstark commented 4 months ago

The source/target labels get adding in during the data loading of the model. Typically, the "source" corresponds to the control condition. You need to run the model in order to induce the pairing across conditions. We have more details on how to run the model in the repo's readme. Hope this helps!

TedSIWEILIU commented 2 months ago

Where do you find the 4i data?

Ralmohsen commented 2 months ago

You can find it in the paper repository page:https://github.com/bunnech/cellot README file then, there is a section about Dataset: It says (You can download the preprocessed data ... ) There is a link that will take you to ploybox website and you can download it from there.

TedSIWEILIU commented 2 months ago

Very weird. From this link (https://polybox.ethz.ch/index.php/s/RAykIMfDl0qCJaM), I have obtained many other datasets but none of them named 4i. Maybe they have modified this file sometimes.

Here is a screenshot of the dataset I have obtained by unzipping the preprocessed data: WechatIMG394

Ralmohsen commented 2 months ago

You can download everything from: https://www.research-collection.ethz.ch/handle/20.500.11850/609681

img

From CELLOT paper: see screenshot, where it says (The processed datasets of all tasks can be accessed at)