JuliaText / TextAnalysis.jl

Julia package for text analysis
Other
373 stars 95 forks source link

allow DocumentMetadata to hold arbirtary data #158

Closed tanmaykm closed 9 months ago

tanmaykm commented 5 years ago

This introduces a new custom field in DocumentMetadata that is set to nothing by default, but can be used by user code to store arbirtary metadata against the document for use later. Having a pre-determined place to store such data would simplify processing in may cases.

zgornel commented 5 years ago

A more flexible approach would be to allow documents to hold any metadata (arbitrarily complex) and provide the mechanism for converting custom metadata to the 'standardized' DocumentMetadata https://zgornel.github.io/StringAnalysis.jl/dev/doc_extensions/

Just for the record, there was a pull request extending DocumentMetadata with a few fields a while ago that went stale for months on end.

tanmaykm commented 5 years ago

Yes, having an abstract metadata type seems like a better idea. The API changes may be more intrusive though? Stemming depends on the language stored in metadata, that needs to be abstracted out. And there are a bunch of APIs in metadata.jl. Is there anything else?

tanmaykm commented 5 years ago

Rebased to resolve conflicts.

Probably this change will be sufficient for now? While we can continue discussing about a more appropriate metadata representation for the future.

rssdev10 commented 9 months ago

Hi @tanmaykm and @zgornel , I'm trying to refresh TextAnalysis last month. I find this PR useful, but made some changes to keep it API compatible. If there are no objections, I'd like to merge it.