SciNim / Datamancer

A dataframe library with a dplyr like API
https://scinim.github.io/Datamancer/datamancer.html
MIT License
130 stars 6 forks source link

js target support? #61

Closed quimt closed 7 months ago

quimt commented 7 months ago

Currently attempts to import datamancer when compiling to javascript fail. This is due to the way the package relies on arraymancer:


.nimble/pkgs2/arraymancer-0.7.27-7af6e290b723aead93067e1a52b0b369dd49cfbf/arraymancer/tensor/init_cpu.nim(235, 18) template/generic instantiation of `randomTensorCpu` from here
.nimble/pkgs2/arraymancer-0.7.27-7af6e290b723aead93067e1a52b0b369dd49cfbf/arraymancer/tensor/init_cpu.nim(208, 18) template/generic instantiation of `allocCpuStorage` from here
.nimble/pkgs2/arraymancer-0.7.27-7af6e290b723aead93067e1a52b0b369dd49cfbf/arraymancer/laser/tensor/datatypes.nim(93, 29) template/generic instantiation of `finalizer` from here
.nimble/pkgs2/arraymancer-0.7.27-7af6e290b723aead93067e1a52b0b369dd49cfbf/arraymancer/laser/tensor/datatypes.nim(71, 23) Error: attempting to call undeclared routine: 'deallocShared'

It's a shame that the DataFrame is not available for js. The API is nice and should be available independent of platform-specific concepts like memory management.

Vindaar commented 7 months ago

Hey,

thanks for the nice words and interest!

From my point of view that is a surprising request, but I can see the appeal!

I just had a ~1 hour look and it seems at least basic support is achievable. I don't know how much worse performance is (it will depend). But I've implemented a different backend that simply uses a seq[T] as storage:

https://github.com/SciNim/Datamancer/pull/62

Note that I'm not super familiar with the JS backend. So I had to take out a few features (reading CSVs from a file on disk or from a URL), because I'm not exactly sure how to handle these there correctly.

Feel free to check out the related branch and play around with it. That would be much appreciated. And if you have insights into how to improve it / better support some of the missing features, I'm all ears.

quimt commented 7 months ago

Will do. File I/O is probably not as important as toDf from seqs and sebsequent mutate, select, and filter operations.

I agree though that a mature web app would want to interact with user-supplied datasets. I'll try to have a look into how Datamancer does this and the various ways user apps might like to interact with CSV.

In my typical use case,, it's probably good enough to embed a dataset using static: and nim-csv to get it in the nim app as a seq of seqs. Since extremely good performance is not likely to be an issue, the datasets involved are usually smaller and don't require much extra space.

Vindaar commented 7 months ago

Cool, thanks for your help!

Vindaar commented 7 months ago

Merging #62 with basic support working for now. I'm going to close the issue, but feel free to open more issues to discuss missing features or whatever else comes to mind!