Closed breck7 closed 1 year ago
I actually think the way to do it might just be one-offs. Thinking about how I do it in datascience, it depends on the analysis I'm doing, but usually I would one hot encode columns like this. The problem is then we'd have a giant CSV with 10,000 columns. So perhaps we provide some simple scripts or NPM/R/PyPI package with methods for quick access to ready to go data depending on the analysis to be done.
As @tif-calin and @SRS-WRKS have pointed out, there's a number of places where list columns (ie Origin Community, CompilesTo, etc), are handled incorrectly at read time:
https://github.com/breck7/pldb/issues/348
Let's fix this site wide.