OmicsML / CellPLM

Official repo for CellPLM: Pre-training of Cell Language Model Beyond Single Cells.
BSD 2-Clause "Simplified" License
67 stars 6 forks source link

Request for Update on Dataset Details #5

Closed WhenMelancholy closed 7 months ago

WhenMelancholy commented 9 months ago

Hello, I noticed that the paper mentioned updates on the details of the training dataset would be made available in the GitHub repository. May I ask if there are plans to update this part? For instance, which parts of the HTCA HCA GEO dataset were used? Thank you!

wehos commented 9 months ago

Thank you for your reminder. I will work on this issue and will keep you posted.

wehos commented 7 months ago

Hi,

I did double check that the detailed dataset list was provided in the supplementary file in Table 6 in our ICLR submission here on openreview. This was added during the rebuttal. Meanwhile, in the BioRXiv version we haven't updated this information, and we will replace it with our camera-ready version soon. Sorry for causing the confusion.

Note that in Septemeber (right after the submission) we have trained a new version of the model with CellxGene Census LTS 2023-05-15. Therefore, any checkpoints after September was based on this data source.

WhenMelancholy commented 7 months ago

Thank you for your help! I will check the ICLR version.