Open anjackson opened 8 months ago
That paper at iPRES, applying https://maartengr.github.io/BERTopic/ to a different corpus of digital preservation papers, seemed to mirror what I'd found with spacy. You don't get much that makes sense when you've only got metadata to work with. I suspect this is generally true that domains with terms of art and difficult to integrate with generic language tools, at least without a decently large corpus. Perhaps this needs the full-text to be in place?
Leading on from #2
Proposed features
Ideas
6
7
3
9
From Micky:
Are there some parts where we do need community editing workflows to manage some aggregation data? Like the iPRES conference metadata? Are there tools for supporting analysis and visualisation? See digipres/registries-of-practice-project#16