Open jqnatividad opened 1 year ago
will be done when qsv describegpt
command is done. Though qsv is primarily focused on tabular data, describegpt
will have a mode in a later version to summarize PDFs and get get the description and tags for CKAN, which we can use in DP+.
https://github.com/jqnatividad/qsv/pull/1036
cc @rzmk @samibaig
Thinking about it more, PDF summarization is outside the scope of qsv, so we should not add that functionality to qsv.
Though it is still in scope for DP+.
The legacy Datapusher used to support PDFs, as messytables supported extracting tables from PDFs using pdftables.
That functionality has been removed, as well as Excel support.
We reenabled Excel support in DP+ using qsv.
We should re-enable PDF support again, not to extract tables for now (though there is tabula-rs), but to summarize the content for the Description field and suggest tags.