HumanCellAtlas / ontology

3 stars 1 forks source link

[ENQ] Term for JSON file including image acquisition metadata #102

Closed ipediez closed 2 years ago

ipediez commented 2 years ago

Hi! We are now starting to include spatial transcriptomics into HCA, and we've come across a new file for which we don't seem to find an adequate ontology term. Spatial datasets come with a .jpg image file (which we are tagging as Image data:2968), a .csv file with the correspondence between pixels in the image and cell barcodes (we are tagging it as cell barcode EFO:0010198), and a .json file including the spot diameter and the scale factors of the .jpg image. This last one is the file that we are having difficulties finding an ontology term for.

Do you have any idea of what could we use, or if a new term needs to be created?

Extra information in case you need it: the scale factors provide overall magnification information for tissue images, including the original full resolution image and the downsampled hi-res and low-res images. Both this and the spot diameter would be metadata about the image acquisition.

paolaroncaglia commented 2 years ago

Hi @ipediez , Among the EDAM terms available in HCAO, there's a broad one for json files: format:3464 JSON. It's defined as "JavaScript Object Notation format; a lightweight, text-based format to represent tree-structured data using key-value pairs.". It has a few children, in case any of them is more fitting (but at a quick glance it doesn't seem so). This is assuming that you'd be ok with using a term in the EDAM 'Format' branch. If so, and if none of the children of format:3464 JSON is specific to your use case, I'd recommend using format:3464 JSON in the short term, because I'm not sure what the turnaround time would be for EDAM to create and release a new term. We can ask, of course. Let me know what you think and we'll go from there! Thanks, Paola

ipediez commented 2 years ago

I think format:3464 JSON is not suitable for our needs. We would need an ontology term that describes the content of the file, rather than the format. For example, we are also including FASTA files, but the ontology term for the "content description" field in our metadata would be data:3494 DNA sequence rather than format:1929 FASTA. Ideally we would use some child term of data:0006 Data

paolaroncaglia commented 2 years ago

@ipediez thanks for your feedback. I'll look into the EDAM branch then, and if we can't find a suitable class there, we might create a new one in EFO for timely availability.

paolaroncaglia commented 2 years ago

@ipediez within the EDAM 'Data' branch, as you indicated, my suggestion is to use 2 separate terms for the 2 types of content:

spot diameter: data:3108 Experimental measurement

scale factors: data:3546 'Image metadata'

Or 'Image metadata' for both.

Would any of these options work for you?

ipediez commented 2 years ago

@paolaroncaglia Thanks a lot, those terms are a perfect fit, I don't know how 'image metadata' escaped my screening. We can use them right now without any change. Thanks again!

paolaroncaglia commented 2 years ago

@ipediez great, you're welcome! I'll close this ticket then. Best, Paola