digipres / digipres-practice-index

An experiment in gathering together sources of information about digital preservation practices
GNU Affero General Public License v3.0
2 stars 1 forks source link

DigiPres Publications Index v2.0 #5

Open anjackson opened 7 months ago

anjackson commented 7 months ago

Leading on from #2

Proposed features

Ideas

From Micky:

Are there some parts where we do need community editing workflows to manage some aggregation data? Like the iPRES conference metadata? Are there tools for supporting analysis and visualisation? See digipres/registries-of-practice-project#16

anjackson commented 1 week ago

That paper at iPRES, applying https://maartengr.github.io/BERTopic/ to a different corpus of digital preservation papers, seemed to mirror what I'd found with spacy. You don't get much that makes sense when you've only got metadata to work with. I suspect this is generally true that domains with terms of art and difficult to integrate with generic language tools, at least without a decently large corpus. Perhaps this needs the full-text to be in place?