Closed ItsMeBrianD closed 10 months ago
Where/why is it being read?
By default all files in a sources directory that are not connection.yaml or connection.options.yaml are sent to the connector as potential source files; this includes reading the file contents to a string, along with the filepath
Because the duckdb file is not being skipped, it is being treated the same way
Are file contents read anywhere except in db adapters? Could just pass adapters an object along the lines of
{
filepath: filepath,
get contents() {
return fs.readFileSync(filepath)
}
}
could be fs.readFile
but it's weird to require an await obj.contents
They shouldn't be, that may be a better approach. I'm not sure if doing that sync is a good move though if we want to parallelize data source execution
Agree that requiring the await isn't ideal
Could you ignore certain binary extensions? (eg .duckdb .db .sqlite)
Or is this just not flexible in case someone drops in an big .exe or something else random
I've implemented Cormick's solution temporarily in the latest prerelease; we'll see if it has large implications on source building performance
I think a good way to approach this is to simply not load the file contents of large files; if the adapter wants to still handle those files, then the option will be there, the adapter will just have to load them itself
What are we going to consider a large file? >1MB?
I think this is currently implemented to an extent; using 100MB as the "large file" delimiter
Fixed by #1064
Not sure how we want to handle this; but I am unable to open a duckdb file that is 1.8GB in size because the plugin-connector is attempting to read the file as a string