aplbrain / npyjs

Read numpy .npy files in JavaScript
https://aplbrain.github.io/npyjs/
Apache License 2.0
77 stars 21 forks source link

How to load a local npy file using npyjs in React? #29

Closed armsp closed 2 years ago

armsp commented 2 years ago

Loading a 2D array does not seem to work despite the code compiling. In the console I get the error -

Uncaught (in promise) SyntaxError: JSON.parse: unexpected character at line 1 column 1 of the JSON data

The code that I have is -

import ndarray from "ndarray";
import npyjs from "npyjs";

const NP = function(){

let n = new npyjs();

n.load("./embeddings.npy").then(res => {
    // res has { data, shape, dtype } members.
    const npyArray = ndarray(res.data, res.shape);
    console.log(npyArray);
});
}
export default NP;

If I replace the file path with https://rawcdn.githack.com/aplbrain/npyjs/ba60a3a529f3210dd07d2ed05ab628939e18b6a7/test/data/4x4x4x4x4-float32.npy then it seems to work...is there a way to load from local file paths?

armsp commented 2 years ago

I think I got it working using -

import embeddings from "./embeddings.npy";
j6k4m8 commented 2 years ago

Awesome, glad you got it working, and thank you for sharing your solution! Please let me know if you run into any other trouble :)

armsp commented 2 years ago

@j6k4m8 I am actually trying to use your library to read npy files and use the extracted data to render scatter-plots using regl (react gl scatterplot), but yeah, I think I am stuck at the last step...been at it for two days...I think its just a conversion/type issue...but being new to web dev, it is taking longer than usual. So far I have made the progress as here - https://github.com/flekschas/regl-scatterplot/issues/83

armsp commented 2 years ago

@j6k4m8 I was wondering, if it is possible to preserve the "topology" of the numpy array when it is read by npyjs/ndarray? What I mean is, right now when I load a 2D matrix, npyjs reads it as a 1D array. For e.g: 5x2 matrix is read as a 1x10 array. What I want is to keep it as a 5x2 array, do you think that's possible?

Currently the way I am reading a npy file in react is as follows -

import npyjs from "npyjs";
import embeddings from "./2DEmbeddings.npy";
const NP = function(){
    let e = new npyjs();
    var em;
    //const XYpoints = [];
    const loadData = () => {
        e.load(embeddings).then(res => {
                console.log(`You loaded an array with ${res.data.length} elements and ${res.shape.length} dimensions.`);
                //console.log(res.shape);
                //console.log(res.data);
                return res.data;
            });
    console.log("Data is - ", loadData());
    }  
export default NP;

Here you can clearly see that even though the npy file is 2D matrix, res.data is 1D...and that's the biggest issue.

j6k4m8 commented 2 years ago

It is definitely possible!

I made a deliberate design decision on this project to NOT make any assumptions about how the user wants the data stored; there is no widely-accepted way of storing tensors in JS, unlike the widely used numpy python library.

People frequently use this npyjs library in conjunction with ndarray, so you could do the following:

const ndarray = require("ndarray");

...
let myTensor = ndarray(new Float64Array(res.data), res.shape)

But some people also use this library with TensorFlow.js, in which case you'd do something like this:

const myTensor = tf.tensor(res.data, res.shape);

By staying agnostic to the framework you are using, we are trying to stay as flexible as possible to your use-case. But I agree that it adds a little extra work :)

If you absolutely must not import another library, you could do something like this:

function indexIntoArray(array, shape, ndShapedIndex) {
    let linearIndex = 0;

    for (let i = 0; i < shape.length; i++) {
        linearIndex += ndShapedIndex[i] * shape[i];
    }

    return array[linearIndex];
}

And then access the $M_{i,j}$th element of your tensor like this:

indexIntoArray(res.data, res.shape, [i, j])

For example, to get the data in row 4 and column 6, you could do:

indexIntoArray(res.data, res.shape, [4, 6])

Alternatively, you could convert the res.data into a nested list of lists... But I do not recommend this, if you can avoid it!

armsp commented 2 years ago

@j6k4m8 Talking about -

const ndarray = require("ndarray");

...
let myTensor = ndarray(new Float64Array(res.data), res.shape)

I think its still the same as what I had described earlier right? myTensor here would have data and shape properties where data would again be 1D...this exactly is the issue for me not being able to use the array to pass to scatterplot.

I think I am missing something very silly...forgive me for taking so much of your time...but let me show this again - According to your documentation , I do -

import embeddings from "./2DEmbeddings.npy";

let e = new npyjs();
e.load(embeddings).then(res => {
                console.log(`You loaded an array with ${res.data.length} elements and ${res.shape.length} dimensions.`);
                console.log(res.data); // this is 1D...when I wish it was same as numpy file, i.e 2 or 3D
                console.log(ndarray(new Float64Array(res.data), res.shape)); // this is same as above - i.e 1D
                // return res.data;
            });

I think the load function itself transforms the matrix into 1D...that is the problem.

I don't mind using Tensorflow.js if it helps me...let me check that now.

j6k4m8 commented 2 years ago

I don't know the ndarray package particularly well, but I believe that it has n-dimensional indexing and slicing operators; what do you need it to do?

From your issue on flekschas/regl-scatterplot#83, it looks like you need a nested list-of-lists where each point is a list of [x, y, color]. The easiest way to do this is:

function convertToXYC(data1d) {
    let points = [];

    for (let i = 0; i < data1d.length; i += 3) {
        points.push([data1d[i], data1d[i + 1], data1d[i + 2]]);
    }
    return points;
}

...if your data are in row-first order in the npy file, or

function convertToXYCRows(data1d) {
    // The transpose of the above function.
    let points = [];
    let lengthOfRow = data1d.length / 3;
    for (let i = 0; i < lengthOfRow; i++) {
        points.push([data1d[i], data1d[i + lengthOfRow], data1d[i + 2 * lengthOfRow]]);
    }
    return points
}

...if your data are in column-first order in the npy file.

(If your dataset is a set of x and y without color values, then replace the 3's with 2's and remove the last element of each push line.)