Open empz opened 4 years ago
It's always been the plan to support this and I even tried to implement it once. The problem is that it might require a very different interface and so I might have to save it for data-forge version 2.
I will come back to this again at some point and rethink it.
In the meantime, if you have any proposal on how this should work I'd love to discuss it with you!
@ashleydavis i have an idea about it, and i can work on it. Because i really need this presently
Hey @olawalejuwonm, I'd love to see if you could implement this. If it fits well I'd definitely like to include it in the library.
@olawalejuwonm did you have any success with enabling streaming in papaparse? or looking into some other CSV library? Wanting to use data-forge but having some problems with memory consumption even for smaller files.
@ashleydavis I saw you split out the file system access, do you have any thoughts about trying to utilize temp files to help "batch data" and reduce memory usage?
@rhesus I've decided to not attempt to implement streaming in Data-Forge. It's something I always wanted, but actually not something I ever turned out to need.
I'm more than happy for anyone to present a plan for adding streaming data to reduce memory usage.
A first step would be to create a project in GitHub that runs out of memory while processing a data file. That would give us something to centre our discussions on.
That's fair, I've been wanting to use it inside of lambdas and I've experienced several OOM issues. Probably just a case of trying to use the wrong tool for the job.
Have you tried just breaking your data into smaller bundles that can be processed separately?
That's probably easier than trying to figure out how to upgrade Data-Forge.
Hey @olawalejuwonm, I'd love to see if you could implement this. If it fits well I'd definitely like to include it in the library.
Yes, can I open a PR for it?
@olawalejuwonm of course!
A good way to start would be to log an issue describing how you would integrate the feature. Then we can discuss it there.
sorry please, i'm very familiar with javascript but quite new to ts. can you guide me on how to go with my first contribution on this @ashleydavis ?
@olawalejuwonm of course!
A good way to start would be to log an issue describing how you would integrate the feature. Then we can discuss it there.
If you are new to TypeScript, I'd suggest you learn some before trying to contribute.
Then you can proceed in one of two ways:
Alright. Thank you
On Sun, Aug 28, 2022 at 7:30 AM Ashley Davis @.***> wrote:
If you are new to TypeScript, I'd suggest you learn some before trying to contribute.
Then you can proceed in one of two ways:
- Log an issue and describe what you want to achieve, how you think you might achieve it and we can discuss from there.
- Or feel free to fork and hack something in, then we can discuss how to get to a pull request.
— Reply to this email directly, view it on GitHub https://github.com/data-forge/data-forge-ts/issues/79#issuecomment-1229390911, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMSQ2BE7O4HBHVS2QFKDENDV3MBQZANCNFSM4O2OT5VQ . You are receiving this because you were mentioned.Message ID: @.***>
I see data-forge uses papaparse under the hood to parse CSV files.
Papaparse allows reading from a stream when used in a Node environment (https://github.com/mholt/PapaParse/blob/master/README.md#papa-parse-for-node).
Can we allow such option in the library?
An idea would be to make
dataForge.fromCSV()
to accept either a string or a stream.