NovoNordisk-OpenSource / decentralized-tech-radar

Decentralized Tech Radar - ITU ISE 2024 Collaboration
GNU Affero General Public License v3.0
8 stars 3 forks source link

feat: Merge CSV files and ensuring that duplicates is removed #42

Open August-Brandt opened 4 months ago

Slug-Boi commented 4 months ago

As a simple start me (Theis) and August had a pretty interesting talk. We could use a set with seen blip names and then use a map lookup to generalize terms (e.g go and golang are the same thing and thus get simplified to go using the map) you then do a contains on the set with this simplified map value and if yes then you do some conflict correction (could be ignore all but the first instance to start with) later maybe do some LLM stuff

Please see issue #67 for this^

Another interesting solution for further precision could be the use of a KMP string algorithm