The PDF is up/downloaded as a Vizier file (created by the cell itself) and the cell enters a selection UI (e.g., by embedding a PDF).
The user navigates to a page of the PDF in the cell (optional)
The user selects an area of the PDF (optional)
The user names the table (optional)
The user repeats steps 3-6 for additional tables
The user clicks run
The workflow uses something like tabula to extract tables.
Describe alternatives you've considered
Several commercial tools provide this sort of extraction, or tabula can be used as a command-line... but in both cases there is a non-provenance-tracked separation between the data source and the data obtained from it.
What pain point is this feature intended to address? Please describe. Data often, irritatingly, lives in PDF files.
Describe the solution you'd like
Proposed workflow:
Describe alternatives you've considered Several commercial tools provide this sort of extraction, or tabula can be used as a command-line... but in both cases there is a non-provenance-tracked separation between the data source and the data obtained from it.