Closed tanmaykm closed 9 months ago
A more flexible approach would be to allow documents to hold any metadata (arbitrarily complex) and provide the mechanism for converting custom metadata to the 'standardized' DocumentMetadata
https://zgornel.github.io/StringAnalysis.jl/dev/doc_extensions/
Just for the record, there was a pull request extending DocumentMetadata
with a few fields a while ago that went stale for months on end.
Yes, having an abstract metadata type seems like a better idea. The API changes may be more intrusive though? Stemming depends on the language stored in metadata, that needs to be abstracted out. And there are a bunch of APIs in metadata.jl
. Is there anything else?
Rebased to resolve conflicts.
Probably this change will be sufficient for now? While we can continue discussing about a more appropriate metadata representation for the future.
Hi @tanmaykm and @zgornel , I'm trying to refresh TextAnalysis last month. I find this PR useful, but made some changes to keep it API compatible. If there are no objections, I'd like to merge it.
This introduces a new
custom
field inDocumentMetadata
that is set tonothing
by default, but can be used by user code to store arbirtary metadata against the document for use later. Having a pre-determined place to store such data would simplify processing in may cases.