Open rkm opened 2 years ago
I think the metadata database would be most powerful. That way it could support identifiable UID
or anonymous UID
and it wouldn't have to rely on an image having been extracted to be able to look it up.
That would enable answering other use cases like 'for this image in the SR NLP db / mongodb, is it in relational too? or not'
Nothing stopping it drawing info from both though.
At the moment I've just got a big text file of filenames which I grep ;-)
Another method might be to see if MongoDB can give you a list of keys in the index (by quickly reading the index rather than slowly reading the database), which you could then grep. If it only stores hashes then this won't work.
Another method might be to see if MongoDB can create a computed index, you could create a new index called FileName being computed from Basename(dicomFilePath). Postgres has support for computed indexes, maybe MongoDB does too. Then you could replace the -an.dcm in the anonymised filename and look up the result in the computed index.
Unless I've completely misunderstood what you mean by "metadata database", were you referring to one of the mysql or sql-server databases?
Unless I'm mistaken the anonymised path ends with the SOPinstanceUID plus -an.dcm so adding a MongoDB index on SOPinstanceUID would help immensely. Could also add study and series ids?
When investigating issues with an anonymised file in an extraction, it is often useful to review the original file for comparison. This is currently difficult to do as there is no direct link from the anonymised file back to the source file.
A tool, or a new application in the
smi
binary, could achieve this by looking-up the original path: