DataStax Bulk Loader (DSBulk) is an open-source, Apache-licensed, unified tool for loading into and unloading from Apache Cassandra(R), DataStax Astra and DataStax Enterprise (DSE)
Apache License 2.0
83
stars
30
forks
source link
Support URL files with up to millions of lines #457
This came up while reviewing #399: some users are using giant urlfiles with millions of URLs inside.
This file size isn't how urlfiles were designed to work: indeed currently when a urlfile is parsed, all the parsed URL instances are stored in memory. See AbstractFileBasedConnector#loadURLs.
We should modify that method to return a Flux instead, and merge it with other fluxes.
This came up while reviewing #399: some users are using giant urlfiles with millions of URLs inside.
This file size isn't how urlfiles were designed to work: indeed currently when a urlfile is parsed, all the parsed URL instances are stored in memory. See AbstractFileBasedConnector#loadURLs.
We should modify that method to return a Flux instead, and merge it with other fluxes.
┆Issue is synchronized with this Jira Task by Unito