Closed simonw closed 4 months ago
Dragging files on is a great way to blow through the token allowance. Do I care? As long as the user gets a useful error message I think that's OK for the moment.
In the future it might be nice to split their input and submit in multiple batches for them, but that sounds difficult to get right.
Not just PDFs: dragging and dropping in plain text files should work too.
Turns out it's tricky to detect if a file is binary or text, but this hack works I think:
function isValidUtf8(str) {
const encoder = new TextEncoder();
const decoder = new TextDecoder();
const encoded = encoder.encode(str);
const decoded = decoder.decode(encoded);
return decoded === str;
}
I can use
PDF.js
to support dropping PDFs and extracting their text, which turns out to work pretty well.Demo here: https://observablehq.com/@simonw/extract-text-content-from-a-pdf