ash-project / ash_csv

The CSV data layer for Ash Framework
https://hexdocs.pm/ash_csv
MIT License
11 stars 8 forks source link

Read action takes too long on a large csv #6

Open briankariuki opened 6 months ago

briankariuki commented 6 months ago

I have a csv of about 25000 records and a resource that uses this csv. Using Api.read takes a very long time even when reading only the first 5 records.

zachdaniel commented 6 months ago

What is a very long time? I'm on a pretty overpowered machine, but a file w/ 20k records just returned for me in .25s. With that said, I'll make some improvements to how our reading logic works to make it leverage the streaming nature of the file read.

zachdaniel commented 6 months ago

After I wrote that I realized that .25s is pretty crazy long time for this to take :) I was just thinking when you said "very long time" that it would be much longer. I'm pushing something to main shortly that improves a fair amount, but likely there are just lots of small optimizations to be made in the way we load the data from the CSV file into their actual resource structs.

zachdaniel commented 6 months ago

Will be looking more into this, as it highlights a few places worth optimizing. I also want to see how much of the slow down is coming from the csv parser vs Ash, but I'd venture a guess that it's mostly loading and validating stored types into memory taking the most time.

zachdaniel commented 6 months ago

I've pushed some non trivial performance improvements up. Give main a try and let me know how it goes :)

briankariuki commented 6 months ago

What is a very long time? I'm on a pretty overpowered machine, but a file w/ 20k records just returned for me in .25s. With that said, I'll make some improvements to how our reading logic works to make it leverage the streaming nature of the file read.

More than a minute I'm afraid. I can share the csv and resource for you to try out.

briankariuki commented 6 months ago

I've pushed some non trivial performance improvements up. Give main a try and let me know how it goes :)

Let me try the main branch and report back.

Also does ash_csv work with ash_json_api? I got some error while trying to read the resource via an api endpoint

zachdaniel commented 6 months ago

No reason it shouldn't. What's the error?