dzhw / metadatamanagement

Metadatamanagement (MDM) - Data Search for Higher Education Research and Science Studies
https://metadata.fdz.dzhw.eu
GNU Affero General Public License v3.0
25 stars 9 forks source link

Generation of the data set report fails #3326

Open AndyDaniel1 opened 3 months ago

AndyDaniel1 commented 3 months ago

The generation of the data set report fails.

Error message:

grafik

@SaCodematix @tilovillwock @moellerth

tilovillwock commented 3 months ago

@AndyDaniel1 @thorsteneuler as far as I can tell the report task error message indicates that there's a rogue unicode character in the dataset as described here. This character would be rendered as  in a modern text editor (e.g. Notepad++ or Visual Studio Code).

I don't really have a good explanation how this particular sequence would end up in a dataset other than that maybe the dataset went through a lossy conversion from e.g. a legacy Windows encoding like ISO-8859-1 to UTF-8. I'm unfortunately not that familiar with SPSS.

thorsteneuler commented 3 months ago

@tilovillwock Thank you for the information. I've checked for the rogue unicode character (found 19 of them) and removed them. I'll start the generation of the data set report again with the updated information to check if it works now.

I do have two guesses how this particular character came into the dataset (none due to SPSS). It seems to originated from word in quotations within sentences in quotations.

thorsteneuler commented 3 months ago

@tilovillwock Generation failed again. Are there other rogue characters which I can check for?

tilovillwock commented 3 months ago

@thorsteneuler this time the report generation seems to be failing because we've hit some kind of memory limit. I'm still investigating but this might take some time. I'll report back as soon as possible. Sorry for the inconvenience.

tilovillwock commented 2 months ago

I finally found a configuration that seems to be able to process the workload. We've deployed a fix to production.

@thorsteneuler your report should be listed now since it went through successfully when I tried it.

@anneweber please try creating a report for dat-nac2018-ds1 again.

@AndyDaniel1 we should discuss the underlying issues in detail during our next Jour Fixe.

AndyDaniel1 commented 2 months ago

@tilovillwock thank you!!

AndyDaniel1 commented 2 months ago

The generation failed again for nac2018

grafik

tilovillwock commented 2 months ago

@anneweber I made another adjustment. Seems like it went through. Your PDF report is listed now.

anneweber commented 2 months ago

@tilovillwock Yes, it seems to work now. :) Thank you! 👍