CenterForOpenScience / modular-file-renderer

A Python package for rendering files to HTML via an embeddable iframe
http://modular-file-renderer.readthedocs.io/en/latest/
Apache License 2.0
43 stars 67 forks source link

[ENG-1809] Improve .csv dialect sniffing #362

Open cslzchen opened 4 years ago

cslzchen commented 4 years ago

Ticket

https://openscience.atlassian.net/browse/ENG-1809

Purpose

Improve .csv dialect sniffing.

Changes

Side effects

Memory usage for sniffing depends on the size of the first row of the file. In the worst case, it could sniff at most 10MB. Please note that we don't have enough statistical data on how many bytes the first row of a .csv file takes on average. However, files larger than 10MB have already failed the size check by the renderer before the sniffing starts.

Given that for CSV, we needs to read the full file into memory for rendering anyway and the partial sniff data is always deleted after use, I don't think it will be a problem.

QA Notes

TBD

Deployment Notes

N / A