Garbled character of transformed DCM file

YangShaoyue commented 2 weeks ago

I'm just using following code read and write the original .dcm file, the PatientName and PerformingPhysicianName turn to garbled character in the DICOM App which generates this .dcm file.

I noticed dataset.PatientName and dataset.PerformingPhysicianName data structure is an array which contains an object, like [ { Alphabetic: 'xxxxxxx' } ], maybe this is why the DICOM App displays garbled character？

But there is no garbled character in some "modern" dicom app, like "MicroDicom DICOM Viewer".

Code snippets here:

let arrayBuffer = fs.readFileSync(filePath).buffer;
let DicomDict = dcmjs.data.DicomMessage.readFile(arrayBuffer);
const dataset = dcmjs.data.DicomMetaDictionary.naturalizeDataset(DicomDict.dict);

DicomDict.dict = dcmjs.data.DicomMetaDictionary.denaturalizeDataset(dataset);
let new_file_WriterBuffer = DicomDict.write();
fs.writeFileSync(`${dir}/${fileName}.dcm`, Buffer.from(new_file_WriterBuffer));

Pic here:

YangShaoyue commented 2 weeks ago

I'm digging into this issue, maybe it's something about PN of vr type ? dcmjs seems to transform PN field (like PatientName and PerformingPhysicianName) to Proxy, leading to the OLD dicom app cannot recognize those info (maybe it just knows simple string). Any suggestion to solve this?

pieper commented 2 weeks ago

It's like this to conform to the dicom json model (see this part). And the code here.

If you find an issue that you can fix by taking this into account please open a specific issue.

YangShaoyue commented 2 weeks ago

you mean that dicom app doesn't comform to the standrad dicon json model ? Or I should modify the code here to not use Proxy ?

pieper commented 2 weeks ago

Are you saying that reading a file and then writing it back out changes the format of the PN VRs? If so that sounds like an issue that should be looked into. Maybe you can create a small example that illustrates what the issue is (ideally self-contained with a small example dataset - i.e. create a dataset in json, write it to a file in binary, then read it back again so anyone can see what's happening).

YangShaoyue commented 2 weeks ago

I've tried for hours, now I'm sure that this is a decode/encode issue. if PatientName is assigned English letters value or numbers, the app display nothing wrong, but if it is assigned some simplified Chinese characters, the app displays messy code.

I've tried so many methods, but still can not solve this problem, can you help please? PR_1.2.276.0.7230010.3.1.4.3793589372.3376.1680229829.1036.zip

YangShaoyue commented 2 weeks ago

RW_1.2.276.0.7230010.3.1.4.3793589372.3376.1680229819.1035.zip

this is another dcm file which related to PR_xxxxx.dcm uploaded in the last reply.

YangShaoyue commented 2 weeks ago

PS: that dicom app is a simplified Chinese application installed in Windows XP system, I think this problem is caused by simplified Chinese characters.

In addition, there is an Error for the first use of dcmjs: Error: Unsupported character set: gb-18030, so I add a field for encodingMapping:

var encodingMapping = {
    ...
    "iso-2022-ir-166": "tis-620",
    "iso-2022-ir-58": "iso-ir-58",
    "iso-ir-192": "utf-8",
    gb18030: "gb18030",
    "gb-18030": "gb18030", // new
    "iso-2022-gbk": "gbk",
    "iso-2022-58": "gb2312",
    gbk: "gbk"
    ...
}

pieper commented 2 weeks ago

Thanks for sharing these files. They seem to be a non-standard variant that is like a dicom file, but not conformant to the standard in many ways.

I tried reading with PyDICOM and 3D Slicer and both generated errors. The dciodvfy command also generates a number of errors and warnings about the files.

If you have any control over the source of these files, i.e. if you know or work with the manufacturer of the scanners, you should let them know that these files are not valid dicom and will not be interoperable with other equipment.

If you have no control and are just trying to read the files, then you'll need to write some kind of converter that turns these into something standard so they will work with standard tools. I can only assume that a windows XP application is probably decades old and may have been custom written to work with these non-standard files.

YangShaoyue commented 2 weeks ago

Yes, I think so, but I'm not know or work with the manufacturer of the scanners. Although, seems there is only one problem that the PatientName (top right corner in the picture) and InstitutionName (bottom right corner in the picture) field value turn to messy code in that app, just because it's simplified Chinese character. If these two field are assigned English letter or some numbers value, the app doesn't show any messy code.

Is there any tricks to display simplified Chinese character normally ? It's very likely that just a decode/encode issue.

YangShaoyue commented 2 weeks ago

The encoding of Chinese characters often occurs in Chinese software, especially Windows XP application for decades

pieper commented 2 weeks ago

I'm really not an expert in character encodings, but it may be possible to write some code that converts your files into something closer to a compatible format and then pass the result to dcmjs. I know that dicom support Chinese and many other languages well if the data files comply with the standard.

https://dicom.nema.org/medical/dicom/current/output/chtml/part05/chapter_k.html

dcmjs-org / dcmjs

Garbled character of transformed DCM file #413