biolab / orange3-single-cell

🍊🔬 Orange add-on for gene expression of single cell data
https://singlecell.biolab.si/
Other
17 stars 25 forks source link

Single Cell Datasets: reduce the size of the data sets and speed-up loading #208

Open BlazZupan opened 6 years ago

BlazZupan commented 6 years ago

This issue is related to PR https://github.com/biolab/orange3/pull/3047, which enables saving and loading of compressed pickle files. Once this is merged into Orange and released, I propose to:

This should substantially reduce the transfer and loading time of data sets. For instance, the largest data set currently included (bone marrow with AML) has 64MB, while its pickled xz variant has on 2.4MB.

This update will create an issue with backward compatibility, which will be broken.

astaric commented 6 years ago

If we will migrate to .pickle.xz, I suggest we create a new "repository on serverfiles" and migrate to that. Otherwise, Datasets will start crashing on old(er) versions of the software.

anupparikh commented 6 years ago

We're creating another file format to support? why not standardize on tab and loom?

tab, mtx, loom, pickle...

BlazZupan commented 6 years ago

@anupparikh, it's not a new format, it is just the way of storing tab (csv) files. A quick trick to substantially reduce the size of our demo datasets and speed-up the loading.

anupparikh commented 6 years ago

So this is just for demo datasets, customers won't be creating and using pickled data.

On Mon, Jun 4, 2018 at 12:34 PM Blaž notifications@github.com wrote:

@anupparikh https://github.com/anupparikh, it's not a new format, it is just the way of storing tab (csv) files. A quick trick to substantially reduce the size of our demo datasets and speed-up the loading.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/biolab/orange3-single-cell/issues/208#issuecomment-394472074, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGz9Aez2korZkB61Yu38RsQFxytRdQ0ks5t5Yu2gaJpZM4UYKH6 .