We probably need a script or some code that runs the document processing toolkit over documents in an S3 bucket (which is where they will end up after we've collected them) that haven't been processed before.
Processing in this case mostly means extracting the text.
We probably need a script or some code that runs the document processing toolkit over documents in an S3 bucket (which is where they will end up after we've collected them) that haven't been processed before.
Processing in this case mostly means extracting the text.