Closed lauratolosi closed 5 years ago
Hi @lauratolosi, can you describe your use scenario a little more?
It sounds like you ran the fonduer pipeline with document set A
. Then are wanted to test something with document set B
(which are not documents, not overlapping with A
)?
I'm not sure I understand the scenario in which you want to evaluate new documents, but will not need to go through the process of parsing and everything again.
We do not currently have a way to delete a document directly. But, I'd assume that what you would want to do is to clean up your datasets (e.g. ensure no duplicates or overlaps in those sets) rather than trying to delete documents from the database just to reparse/add them. You might try checking out fdupes
to help.
Hi @lukehsiao, here are the details:
I need to use Fonduer in a real-world application, where a Fonduer trained model is saved on some server and users can upload new documents to get information from them, based on model predictions.
When a new document is uploaded, my understanding is that it needs to be parsed, in order for Mentions and Candidates to be extracted and evaluated by the model.
But a user can run the test many times with the same documents, which results in errors with Fonduer. I myself, when developing the system, was testing many times with on same new documents. I needed a way to remove those documents, handle the error that Document already exists, and start fresh.
I found a way around this by creating a copy of the original database every time a test on new documents starts. Maybe it is the easiest solution. Or maybe you have a better suggestion?
Thanks for looking into this!
Hi @lauratolosi,
There is an easy solution. We use document name as the primary key in Fonduer. There are two options you can do:
Hope this can help you!
Sen
Close for now, please reopen if it's still a problem.
I want to test the Fonduer model on new files (pdfs), in a separate pipeline. I need to ensure that the file is not already in the database (which results in an error and failure anyway). The solution that I see is to delete the test documents and all related elements from the database before the testing - if they exist.
Can anyone please help with a procedure that deletes documents by name?
I have tried to delete from table
documents
:The following is the detailed information on table
document
. It does not allowCASCADE DELETE
on foreign keys on tablestable
,caption
,paragraph
,sentence
, etcI have tried to delete from table
context
, but I am getting the same conflict.