Rothamsted / knetminer

KnetMiner - webapp to search and visualize genome-scale knowledge graphs
https://knetminer.com
MIT License
25 stars 16 forks source link

ETL Pipeline/Java Parser Issue with Special Characters #760

Closed Arnedeklerk closed 1 year ago

Arnedeklerk commented 1 year ago

I've noticed that our ETL pipeline or the Java parser is having a bit of a hard time with special characters. Take a look at this example:

"value": "[Luk�s Sp�chal, Natalia Yu Rakova, Michael Riefler, Takeshi Mizuno, Georgy A Romanov, Miroslav Strnad, Thomas Schm�lling]"

Special characters in names like "Lukás Spéchal" and "Thomas Schmülling" are not coming out correctly, and are being displayed as "�" instead.

The issue can be seen throughout table view in the new KnetMaps-Plus .

I've run into a similar issue in the past when fetching data from PubMed, and the solution that worked for me was encoding the data using utf-8-sig. This helped preserve special characters correctly.

Arnedeklerk commented 1 year ago

Not 100% sure if this is a KnetMiner or Pipelines problem. It's also possible that it's actually already been resolved, as the Small dataset where I found the problem, is quite old. Just close if that's the case.

marco-brandizi commented 1 year ago

It should be about this paper. It's OK in Knetminer, see below, images taken from this search.

image

image

Arnedeklerk commented 1 year ago

Thanks Marco. Keywan did mention the small dataset in the knetmaps-plus instance is very old. Looks like this was resolved probably a long time ago. Closing.