Taylor-CCB-Group / MDV

GNU General Public License v3.0
8 stars 6 forks source link

Outputting text columns in binary that MDV understands #57

Closed xinaesthete closed 10 months ago

xinaesthete commented 1 year ago
  • [ ] I've not managed to convert text columns into binary in a way that is understood by the MDV app. The numbers are all OK using np.array(values, np.float32).tobytes() but when values is a list of strings I've just done np.array(values).tobytes() but then I'm not getting them shown in MDV (last column of the table above)

Not a comprehensive reply, but here are some relevant snippets from csv_to_static.py that might point you in the right direction, hopefully somewhat parseable:

import pandas as pd
df = pd.read_csv(filename)

def get_text_indices(col):
    values = list(set(col))
    val_dict = {value: i for i, value in enumerate(values)}
    return [val_dict[v] for v in col], [str(s) for s in values]

def get_datasource():
    # ..... snip .... #
    for name in df.columns:
        col = df[name]
        # ..... snip .... #
        elif datatype == 'text' or datatype == 'multitext':
            indices, values = get_text_indices(col)
            col_desc['values'] = values
            df[name] = indices

So then the datasources.json has an array of values for the corresponding column, and the binary should have numeric indices into that array...

Regards, Peter

will-moore commented 1 year ago

Thanks for that.

I used that code in https://github.com/will-moore/omero-mdv/commit/4ca4748b9d63fe5ef32536dbdfa07433c864cb8a

That uses the indices as values for text columns. And it renders the indices if I set the column type to "integer":

Screenshot 2023-06-12 at 22 45 17

If I remove that (so the column should render the values) https://github.com/will-moore/omero-mdv/commit/4d910aaf1abc8a355d946f7aa8abfb332f0c8353 I get unexpected values:

Screenshot 2023-06-12 at 22 24 25

printing the indices and values gives me:

strig indices, vals [1, 0, 2, 2, 0, 0, 1, 2, 1, 1, 1, 2, 1, 2, 0, 2, 1, 1, 1, 0, 1, 0, 1, 1, 0, 2, 1, 2, 2, 1, 1, 1, 0, 2, 2, 0, 1, 0, 1, 0, 0, 1, 2, 2, 0, 0, 1, 0, 2, 1, 1, 0, 1, 0, 0, 2, 1, 2, 2, 1, 1, 0, 2, 0, 0, 2, 1, 0, 2, 1, 0, 2, 0, 0, 0, 1, 1, 1, 2, 1, 0, 1, 0, 2, 2, 1, 2, 1, 0, 0, 2, 2, 1, 2, 2, 0] ['Low', 'High', 'Medium']

So I think I need to dig into the JavaScript to work out what's going on. But I currently don't have an easy way to run vite.js dev server and embed that into omero-mdv - so that might be what I try next... but I'm currently relying on the prod build with base: ./ to give me relative links....

xinaesthete commented 1 year ago

You may want to try running the pjt-dev branch to get the vite dev server to work, if you haven't already.

xinaesthete commented 1 year ago

That is odd that you're getting those values, hopefully we'll make some sense of it soon...

xinaesthete commented 1 year ago

It's not inconceivable that my code in that script is wrong, I should double-check myself...

martinSergeant commented 1 year ago

In the main branch convert_to_static_pageof MDVProjectwill create a folder that hopefully can be added to any server (I've tried it on githubio and a basic server that is running at the WIMM). You can also use this folder for development - see the python readme.

The method used to convert the data is convert data_to_binary which creates very simple compressed binary files and a .json index for each datasource (I was going to to use the zarr format but ran into some problems, but I think I have solved them now) This data is lazily loaded by the CompressedBinaryDataLoader (https://github.com/Taylor-CCB-Group/MDV/blob/main/src/dataloaders/DataLoaders.js)

The structure of each of column type is described here: https://github.com/Taylor-CCB-Group/MDV/blob/main/docs/extradocs/datasource.md#column-data

will-moore commented 1 year ago

Great, thanks for that. With that info I've got my example working with "text" column and will add support for the other types next... Please feel free to close this issue - cheers!