evidence-dev / evidence

Business intelligence as code: build fast, interactive data visualizations in pure SQL and markdown
https://evidence.dev
MIT License
3.44k stars 167 forks source link

Universal SQL: Prevent attempting to read the contents of files over a maximum size #1034

Closed ItsMeBrianD closed 10 months ago

ItsMeBrianD commented 10 months ago

Not sure how we want to handle this; but I am unable to open a duckdb file that is 1.8GB in size because the plugin-connector is attempting to read the file as a string

csjh commented 10 months ago

Where/why is it being read?

ItsMeBrianD commented 10 months ago

By default all files in a sources directory that are not connection.yaml or connection.options.yaml are sent to the connector as potential source files; this includes reading the file contents to a string, along with the filepath

Because the duckdb file is not being skipped, it is being treated the same way

csjh commented 10 months ago

Are file contents read anywhere except in db adapters? Could just pass adapters an object along the lines of

{
    filepath: filepath,
    get contents() {
        return fs.readFileSync(filepath)
    }
}

could be fs.readFile but it's weird to require an await obj.contents

ItsMeBrianD commented 10 months ago

They shouldn't be, that may be a better approach. I'm not sure if doing that sync is a good move though if we want to parallelize data source execution

ItsMeBrianD commented 10 months ago

Agree that requiring the await isn't ideal

archiewood commented 10 months ago

Could you ignore certain binary extensions? (eg .duckdb .db .sqlite)

Or is this just not flexible in case someone drops in an big .exe or something else random

ItsMeBrianD commented 10 months ago

I've implemented Cormick's solution temporarily in the latest prerelease; we'll see if it has large implications on source building performance

ItsMeBrianD commented 10 months ago

I think a good way to approach this is to simply not load the file contents of large files; if the adapter wants to still handle those files, then the option will be there, the adapter will just have to load them itself

csjh commented 10 months ago

What are we going to consider a large file? >1MB?

ItsMeBrianD commented 10 months ago

I think this is currently implemented to an extent; using 100MB as the "large file" delimiter

csjh commented 10 months ago

Fixed by #1064